end0tknr's kipple - web写経開発

太宰府天満宮の狛犬って、妙にカワイイ

kuromoji.js へのユーザ辞書追加

先程のentryの続きです

参考url

nodejsはver.10.16.3 で

参考urlによれば、v10.16.3 以外ではエラーになるようですので

$ nvm install 10.16.3
$ nvm use 10.16.3

$ node -v
v10.16.3

$ npm -v
6.9.0

install kuromoji.js

$ npm install kuromoji
$ cd node_modules/kuromoji
$ npm install
$ npm run build-dict

もし「npm run build-dict」で、out of memmory のエラーとなる場合、以下

$ NODE_OPTIONS="--max-old-space-size=4096" npm run build-dict

ユーザ辞書用csvの作成

$ cd /home/end0tknr/tmp/myproj/node_modules/kuromoji
$ vi node_modules/mecab-ipadic-seed/lib/dict/userdic.csv

上記csvのファイル名は任意で、内容は例えば以下

快感エア,1285,1285,5402,名詞,一般,*,*,*,*,快感エア,カイカンエア,カイカンエア

辞書データのbuild

以下の通りで、完成した *.dat.gz は、先程のentryのように利用できます

$ cd /home/end0tknr/tmp/myproj/node_modules/kuromoji
$ npm run build-dict

> kuromoji@0.1.2 build-dict /home/end0tknr/tmp/myproj/node_modules/kuromoji
> gulp build-dict

[14:38:43] Using gulpfile ~/tmp/myproj/node_modules/kuromoji/gulpfile.js
[14:38:43] Starting 'clean'...
[14:38:43] Starting 'clean-dict'...
[14:38:43] Finished 'clean' after 21 ms
[14:38:43] Starting 'build'...
[14:38:43] Finished 'clean-dict' after 22 ms
[14:38:43] Finished 'build' after 168 ms
[14:38:43] Starting 'build-dict'...
[14:38:43] Starting 'create-dat-files'...
[14:38:43] Finished 'build-dict' after 9.12 ms
Finishied to read token info dics
Finishied to read unk.def
Finishied to read char.def
Finishied to read matrix.def
Finishied to read all seed dictionary files
Building binary dictionary ...
[14:38:47] Finished 'create-dat-files' after 4.25 s
[14:38:47] Starting 'compress-dict'...
[14:38:50] Finished 'compress-dict' after 3.03 s
[14:38:50] Starting 'clean-dat-files'...
[14:38:50] Finished 'clean-dat-files' after 11 ms

$ ls -lh dict 
total 17M
-rw-rw-r-- 1 end0tknr end0tknr 3.8M May 14 14:38 base.dat.gz
-rw-rw-r-- 1 end0tknr end0tknr 1.7M May 14 14:38 cc.dat.gz
-rw-rw-r-- 1 end0tknr end0tknr 3.0M May 14 14:38 check.dat.gz
-rw-rw-r-- 1 end0tknr end0tknr 1.6M May 14 14:38 tid.dat.gz
-rw-rw-r-- 1 end0tknr end0tknr 1.5M May 14 14:38 tid_map.dat.gz
-rw-rw-r-- 1 end0tknr end0tknr 5.7M May 14 14:38 tid_pos.dat.gz
-rw-rw-r-- 1 end0tknr end0tknr  11K May 14 14:38 unk.dat.gz
-rw-rw-r-- 1 end0tknr end0tknr  306 May 14 14:38 unk_char.dat.gz
-rw-rw-r-- 1 end0tknr end0tknr  338 May 14 14:38 unk_compat.dat.gz
-rw-rw-r-- 1 end0tknr end0tknr 1.2K May 14 14:38 unk_invoke.dat.gz
-rw-rw-r-- 1 end0tknr end0tknr 1.2K May 14 14:38 unk_map.dat.gz
-rw-rw-r-- 1 end0tknr end0tknr  11K May 14 14:38 unk_pos.dat.gz