openpyxl for python による excel 読み込みを高速化 - wsheet.iter_rows() - end0tknr's kipple

xlrd for python で excel (xlsx) を読む - end0tknr's kipple - 新web写経開発

上記エントリに倣い openpyxl for python で excel (xlsx)を読むと速度が遅く、特にレコード数の多い excelデータではこれが致命的。

どうやら

cell = wsheet.cell(row=row,column=col)

のように、ワークシートから座標指定で、各セルを取り出していることが原因らしい。

なので、wsheet.iter_rows() を用いることで、かなりの高速化になります。

for cells in wsheet.iter_rows(min_row=2): # min_row: 読取り開始行
    cell = cells[0]
    shukka_date = datetime.datetime.strptime(str(cell.value), '%Y-%m-%d %H:%M:%S')
    shukka_date_str = shukka_date.strftime('%Y-%m-%d')

    col = 1
    while col < len( cells ):
        factory = factories[col-1]
        units_waku = cells[col].value
        fact_shukka_mst[factory][shukka_date_str] = units_waku
        col += 1
     :

iter_rows()は上記のmin_row以外に min_row max_row min_col max_col を指定できます
iter_rows()の他、iter_cols() もあります

end0tknr's kipple - web写経開発

太宰府天満宮の狛犬って、妙にカワイイ

openpyxl for python による excel 読み込みを高速化 - wsheet.iter_rows()