end0tknr's kipple - web写経開発

太宰府天満宮の狛犬って、妙にカワイイ

hands-on elyza (Llama2 based) on windows11 + miniconda3

東京大学松尾研究室(ELYZA社)が、 Llama 2に対し日本語追加事前学習を行ったようですので、お試し。

python scriptと実行結果は以下の通りで先ほどの純粋な Llama2 と比較すると、ずいぶんマシな印象を受けます。

また、今回は、7Bを使用しましたが、 13Bや70Bも準備中のようですので、期待してしまいます。

参考url

pip install

CONDA> pip install transformers accelerate bitsandbytes

写経1 - python scriptと、その実行結果

#!python
# -*- coding: utf-8 -*-
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

def main():
    # トークナイザーとモデルの準備
    tokenizer = AutoTokenizer.from_pretrained(
        "elyza/ELYZA-japanese-Llama-2-7b-instruct"
    )
    model = AutoModelForCausalLM.from_pretrained(
        "elyza/ELYZA-japanese-Llama-2-7b-instruct", 
        torch_dtype=torch.float16,
        device_map="auto"
    )

    # プロンプトの準備
    prompt = """<s>[INST] <<SYS>>
    あなたは誠実で優秀な日本人のアシスタントです。
    <</SYS>>

    まどか☆マギカでは誰が一番かわいい？ [/INST]"""

    # 推論の実行
    with torch.no_grad():
        token_ids = tokenizer.encode(prompt,
                                     add_special_tokens=False,
                                     return_tensors="pt")
        output_ids = model.generate(
            token_ids.to(model.device),
            max_new_tokens=256,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )
        output = tokenizer.decode(output_ids.tolist()[0][token_ids.size(1) :],
                                  skip_special_tokens=True )
    print(output)


if __name__ == '__main__':
    main()

(mycuda) C:\Users\end0t\tmp>python elyza_1.py
Downloading (…)okenizer_config.json: 100%|████████| 725/725 [00:00<00:00, 543kB/s]
C:\Users\end0t\miniconda3\envs\mycuda\lib\site-packages\huggingface_hub\file_download.py:133: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\end0t\.cache\huggingface\hub. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
Downloading tokenizer.model: 100%|█████████████| 500k/500k [00:00<00:00, 7.00MB/s]
Downloading (…)/main/tokenizer.json: 100%|███| 1.84M/1.84M [00:00<00:00, 2.06MB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████████| 437/437 [00:00<?, ?B/s]
Downloading (…)lve/main/config.json: 100%|███████████████| 641/641 [00:00<?, ?B/s]
Downloading (…)model.bin.index.json: 100%|███████████| 26.8k/26.8k [00:00<?, ?B/s]
Downloading (…)l-00001-of-00002.bin: 100%|███| 9.98G/9.98G [07:11<00:00, 23.1MB/s]
Downloading (…)l-00002-of-00002.bin: 100%|███| 3.50G/3.50G [02:21<00:00, 24.7MB/s]
Downloading shards: 100%|██████████████████████████| 2/2 [09:34<00:00, 287.35s/it]
Loading checkpoint shards: 100%|████████████████████| 2/2 [00:11<00:00,  5.95s/it]
Downloading (…)neration_config.json: 100%|████████| 154/154 [00:00<00:00, 151kB/s]
C:\Users\end0t\miniconda3\envs\mycuda\lib\site-packages\transformers\generation\utils.py:1411: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
  warnings.warn(
まどか☆マギカに登場するキャラクターの中で、
誰が一番かわいいか、私の判断で回答いた
します。

まどか、アリス、 Homura の3人は、かわいいという感情が伝わるキャラクターです。

しかし、個人的な好みや考え方による差異があるため、
一概に誰が一番かわいいかは断言で きません。

あくまで参考にしてください。

写経2 - python scriptと、その実行結果

#!python
# -*- coding: utf-8 -*-

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
DEFAULT_SYSTEM_PROMPT = "あなたは誠実で優秀な日本人のアシスタントです。"
text = "クマが海辺に行ってアザラシと友達になり、最終的には家に帰るというプロットの短編小説を書いてください。"

model_name = "elyza/ELYZA-japanese-Llama-2-7b-instruct"

def main():
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name,
                                                 torch_dtype="auto")
    if torch.cuda.is_available():
        print("CUDA IS AVAILABLE !")
        model = model.to("cuda")

    prompt = "{bos_token}{b_inst} {system}{prompt} {e_inst} ".format(
        bos_token=tokenizer.bos_token,
        b_inst=B_INST,
        system=f"{B_SYS}{DEFAULT_SYSTEM_PROMPT}{E_SYS}",
        prompt=text,
        e_inst=E_INST)

    with torch.no_grad():
        token_ids = tokenizer.encode(prompt,
                                     add_special_tokens=False,
                                     return_tensors="pt")

        output_ids = model.generate(
            token_ids.to(model.device),
            max_new_tokens=256,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )
    output = tokenizer.decode(output_ids.tolist()[0][token_ids.size(1) :],
                              skip_special_tokens=True )
    print(output)

if __name__ == '__main__':
    main()

(mycuda) C:\Users\end0t\tmp>python elyza_2.py
Loading checkpoint shards: 100%|████████████████████| 2/2 [00:06<00:00,  3.09s/it]
CUDA IS AVAILABLE !
C:\Users\end0t\miniconda3\envs\mycuda\lib\site-packages\transformers\generation\utils.py:1411: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
  warnings.warn(
承知しました。以下にクマが海辺に行ってアザラシと友達になり、最終的には家に帰るとい
うプロットの短編小説を記述します。

クマは山の中でゆっくりと眠っていた。
その眠りに落ちたクマは、夢の中で海辺を歩いていた。
そこにはアザラシがいた。
クマはアザラシに話しかける。

「おはよう」とクマが言うと、アザラシは驚いたように顔を上げた。
「あ、おはよう」アザラシは少し緊張した様子だった。
クマはアザラシと友達になりたいと思う。

「私はクマと