end0tknr's kipple - web写経開発

太宰府天満宮の狛犬って、妙にカワイイ

postgres の unique制約で null値があると、ON CONFLICT ON CONSTRAINT が発動しない

postgres の documentにも記載されていました...

https://www.postgresql.jp/docs/9.2/indexes-unique.html

CREATE TABLE IF NOT EXISTS suumo_bukken (
build_type      varchar(32),
bukken_name     varchar(64),
price           bigint,
price_org       varchar(64),
address         varchar(128),
plan            varchar(32),
build_area_m2   int,
build_area_org  varchar(64),
land_area_m2    int,
land_area_org   varchar(64),
build_year      int,
create_date     date,
keep_date       date,
primary key(id),
UNIQUE suumo_bukken_unique
    (build_type,bukken_name,address,plan,
     build_area_org,land_area_org,build_year)
);

-- または

ALTER TABLE suumo_bukken ADD constraint
  suumo_bukken_unique
  UNIQUE ( build_type,bukken_name,address,plan,build_area_org,
           land_area_org,build_year );

のような db tableに対し

    def save_bukken_infos(self, build_type, bukken_infos):
        logger.info("start "+ build_type)

        date_str = datetime.datetime.now().strftime('%Y-%m-%d')
        row_groups = self.divide_rows_info(build_type,
                                           bukken_infos,
                                           bulk_insert_size,
                                           date_str )

        sql = """
INSERT INTO suumo_bukken
  (build_type,bukken_name,price,price_org,address,plan,build_area_m2,
   build_area_org,land_area_m2,land_area_org,build_year,create_date)
  VALUES %s
ON CONFLICT ON CONSTRAINT suumo_bukken_unique
  DO UPDATE SET keep_date='%s'
"""
        sql = sql % ("%s", date_str)
        
        with self.db_connect() as db_conn:
            with self.db_cursor(db_conn) as db_cur:

                for row_group in row_groups:
                    try:
                        # bulk insert
                        extras.execute_values(db_cur,sql, row_group )
                    except Exception as e:
                        logger.error(e)
                        logger.error(sql)
                        logger.error(row_group)
                        return False
                    
            db_conn.commit()
        return True
    
    def divide_rows_info(self, build_type, org_rows, chunk_size,date_str):
        i = 0
        chunk = []
        ret_rows = []
        for org_row in org_rows:
            chunk.append( ( build_type,
                            org_row['bukken_name'],
                            org_row['price'],
                            org_row['price_org'],
                            org_row['address'],
                            org_row['plan'],
                            org_row['build_area_m2'],
                            org_row['build_area_org'],
                            org_row['land_area_m2'],
                            org_row['land_area_org'],
                            org_row['build_year'] or 0,
                            date_str )
                         )
            
            if len(chunk) >= chunk_size:
                ret_rows.append(chunk)
                chunk = []
            i += 1

        if len(chunk) > 0:
            ret_rows.append(chunk)

        return ret_rows

にあるような upsert & bulk insertを行ったところ、

ON CONFLICT ON CONSTRAINT suumo_bukken_unique
  DO UPDATE SET keep_date='%s'

が発動しない。

調べてみた結果、

https://www.postgresql.jp/docs/9.2/indexes-unique.html

に記載されている通りでした。

その他、bulk insertする値群の中で、重複がある場合

ERROR    ON CONFLICT DO UPDATE command cannot affect row a second time
HINT:  Ensure that no rows proposed for insertion
within the same command have duplicate constrained values.

のようなエラーとなります。