end0tknr's kipple - web写経開発

太宰府天満宮の狛犬って、妙にカワイイ

更にpostgisである地点から海岸線までの距離算出を高速化

postgisである地点から海岸線までの距離算出を高速化する - end0tknrのkipple - web写経開発
前回、PostGISによる ある点座標から海岸線までの距離算出を高速化(30->3〜4秒)しましたが、もう少し高速化してみます。

explain analyze すると、「actual time=6430.209」!!

どうやら、where句での座標比較で ST_Distance_Sphere()の対象レコードを制限したものの、更に制限を強化し、ST_Distance_Sphere()の対象レコードを減らしたほうがよさそうです。

kaigan=# explain analyze
kaigan-# select min(ST_Distance_Sphere('POINT(135 40)', k.the_geom))
kaigan-# from kaigan_box kj
kaigan-# join kaigan k on (k.gid=kj.gid)
kaigan-# where (st_xmin between 134 and 136) or
kaigan-#       (st_xmax between 134 and 136) or
kaigan-#       (st_ymin between 39 and 41) or
kaigan-#       (st_ymax between 39 and 41);
                          QUERY PLAN                                                                                          
---------------------------------------------------------------------
 Aggregate  (cost=14442.14..14442.40 rows=1 width=3766)
            (actual time=6430.209..6430.209 rows=1 loops=1)
   ->  Merge Join  (cost=0.00..14350.50 rows=36653 width=3766)
                   (actual time=0.114..140.055 rows=22079 loops=1)
         Merge Cond: (kj.gid = k.gid)
         ->  Index Scan using kaigan_box_pkey on kaigan_box kj
               (cost=0.00..6358.56 rows=36653 width=4)
               (actual time=0.063..61.578 rows=22079 loops=1)
               Filter: (((st_xmin >= 134) AND (st_xmin <= 136)) OR
                        ((st_xmax >= 134) AND (st_xmax <= 136)) OR
                        ((st_ymin >=  39) AND (st_ymin <=  41)) OR
                        ((st_ymax >=  39) AND (st_ymax <=  41)))
         ->  Index Scan using kaigan_pkey on kaigan k
               (cost=0.00..7330.06 rows=81487 width=3770)
               (actual time=0.045..48.664 rows=51431 loops=1)
 Total runtime: 6430.349 ms
(7 rows)

副問合せ(sub query)で、ST_Distance_Sphere() の対象を制限すると、更に高速化

前準備

前回は、tbl:kaigan_boxを作成し、joinしていましたが、joinも高速化する為には避けるべきなので、今回は、gisデータが登録されたtableに矩形の重心座標を登録します。

alter table kaigan add column st_xmid decimal(11,8);
alter table kaigan add column st_ymid decimal(11,8);
alter table kaigan add column st_xlen decimal(11,8);
alter table kaigan add column st_ylen decimal(11,8);
で、やってみると、数100msec/回に高速化

書いている通り

kaigan=# explain analyze
kaigan-# select min(ST_Distance_Sphere('POINT(135 34)', k.the_geom))
kaigan-# from kaigan k
kaigan-# where gid in
kaigan-# (select gid from kaigan k2
kaigan(#  where
kaigan(#   (k2.st_xmid between 134 and 136) and
kaigan(#   (k2.st_ymid between 33 and 35)
kaigan(#  order by abs(k2.st_xmid-135)+abs(k2.st_ymid-34)
kaigan(#  limit 50
kaigan(# );
                      QUERY PLAN                                                                           
----------------------------------------------------------------------------------------
 Aggregate  (cost=9806.23..9806.49 rows=1 width=3720)
            (actual time=141.276..141.276 rows=1 loops=1)
   ->  Nested Loop  (cost=9394.17..9806.10 rows=50 width=3720)
                    (actual time=50.348..50.926 rows=50 loops=1)
         ->  HashAggregate  (cost=9394.17..9394.67 rows=50 width=4)
                            (actual time=50.293..50.320 rows=50 loops=1)
               ->  Limit  (cost=9393.42..9393.54 rows=50 width=12)
                          (actual time=50.257..50.265 rows=50 loops=1)
                     ->  Sort  (cost=9393.42..9408.51 rows=6036 width=12)
                               (actual time=50.256..50.258 rows=50 loops=1)
                           Sort Key: ((abs((k2.st_xmid - 135)) + abs((k2.st_ymid - 34))))
                           Sort Method:  top-N heapsort  Memory: 27kB
                           ->  Bitmap Heap Scan on kaigan k2
                                 (cost=739.21..9192.91 rows=6036 width=12)
                                 (actual time=2.763..46.592 rows=13529 loops=1)
                                 Recheck Cond: ((st_xmid >= 134) AND (st_xmid <= 136))
                                 Filter: ((st_ymid >= 33) AND (st_ymid <= 35))
                                 ->  Bitmap Index Scan on st_xmid_kaigan
                                       (cost=0.00..737.70 rows=14942 width=0)
                                       (actual time=2.415..2.415 rows=14770 loops=1)
                                       Index Cond: ((st_xmid >= 134) AND (st_xmid <= 136))
         ->  Index Scan using kaigan_pkey on kaigan k
               (cost=0.00..8.22 rows=1 width=3724)
               (actual time=0.010..0.011 rows=1 loops=50)
               Index Cond: (k.gid = k2.gid)
 Total runtime: 141.349 ms
(15 rows)

使用したperl script

GIS領域の重心算出&登録
#!/usr/local/bin/perl
use strict;
use DBI;
use Data::Dumper;

my $DB_CONF =
    {db_name => 'dbi:Pg:dbname=kaigan;host=localhost;port=5432;',
     user => 'postgres',
     passwd => '',
     option => {AutoCommit => 0,
	       pg_enable_utf8 => 1}};

main(@ARGV);

sub main {

    my $dbh = connect_db();

    my $box_cos = get_box_cos($dbh);
    for my $box_co ( @$box_cos ){
	set_box_co($dbh,$box_co);
    }

    $dbh->commit();
    $dbh->disconnect();
}

sub set_box_co {
    my ($dbh,$box_co) = @_;
    my $sql =<<EOF;
update kaigan
set st_xmid=?,st_ymid=?,st_xlen=?,st_ylen=?
where gid=?
EOF
    my $sth = $dbh->prepare($sql);
    my @vals  =
	(sprintf("%11.8f", ($box_co->{st_xmin}+$box_co->{st_xmax})/2 ),
	 sprintf("%11.8f", ($box_co->{st_ymin}+$box_co->{st_ymax})/2 ),
	 $box_co->{st_xmax}-$box_co->{st_xmin},
	 $box_co->{st_ymax}-$box_co->{st_ymin},
	 $box_co->{gid});
    $sth->execute(@vals);
}

sub get_box_cos {
    my ($dbh) = @_;
    my $sql =<<EOF;
select
gid,
ST_XMin(the_geom), ST_XMax(the_geom),ST_YMin(the_geom), ST_YMax(the_geom)
from kaigan
EOF
    my $sth = $dbh->prepare($sql);
    $sth->execute();
    my @ret;
    while(my $row = $sth->fetchrow_hashref() ){
	push(@ret,$row);
    }
    return \@ret;
}

sub connect_db {
    return DBI->connect($DB_CONF->{db_name},
			$DB_CONF->{user},
			$DB_CONF->{passwd},
			$DB_CONF->{option});
}
ある座標から海岸線までのGIS距離計算
#!/usr/local/bin/perl
use strict;
use utf8;
use DBI;
use Encode;
use Data::Dumper;

my $DIFF_ROT = 1;
my $GID_LIMIT = 50;
my $DB_CONF =
    {db_name => 'dbi:Pg:dbname=kaigan;host=localhost;port=5432;',
     user => 'postgres',
     passwd => '',
     option => {AutoCommit => 0,
	       pg_enable_utf8 => 1}};

main(@ARGV);

sub main {
    my ($lon_lat_file) = @_;

    my $dbh = connect_db();

    open (my $fh,"<:encoding(utf8)",$lon_lat_file) or
	die "can't open file: $lon_lat_file: $!";
    my @lines = <$fh>;
    close($fh) or die "can't close file: $lon_lat_file: $!";

    my $i = 0;
    for my $line ( @lines ){
	$i++;
	$line =~ s/\s+$//go;
	my ($t__code,
	    $address_org,
	    $address_parse,
	    $latitude, #緯度
	    $longitude, #経度
	    $accuracy)  #住所parseの精度?
	    = split("\t",$line);
	print encode('utf8',
		     "$i  $t__code  $address_parse  $longitude  $latitude  ->");
	my $kaigan_dist = calc_kaigan_distance($dbh,$latitude,$longitude);
	print "  ",$kaigan_dist,"\n";
    }


    $dbh->disconnect();
}

sub calc_kaigan_distance {
    my ($dbh,$latitude,$longitude) = @_;
    my $sql =<<EOF;
select min(ST_Distance_Sphere(?, k.the_geom))
from kaigan k
where gid in
(select gid from kaigan k2
 where
  (k2.st_xmid between ? and ?) and
  (k2.st_ymid between ? and ?)
 order by (k2.st_xmid-?)^2+(k2.st_ymid-?)^2
 limit $GID_LIMIT)
EOF
    my $sth = $dbh->prepare($sql);
    my @vals = ("POINT($longitude $latitude)",
		$longitude-$DIFF_ROT,$longitude+$DIFF_ROT,
		$latitude-$DIFF_ROT, $latitude+$DIFF_ROT,
		$longitude, $latitude);
    $sth->execute( @vals );
    return $sth->fetchrow_array();
}

sub connect_db {
    return DBI->connect($DB_CONF->{db_name},
			$DB_CONF->{user},
			$DB_CONF->{passwd},
			$DB_CONF->{option});
}