end0tknr's kipple - web写経開発

太宰府天満宮の狛犬って、妙にカワイイ

PEFT (Parameter-Efficient Fine-Tuning , LoRA) for python で LLM ( rinna/japanese-gpt-neox-3.6b-instruction-ppo )によるテキスト生成

PEFT (Parameter-Efficient Fine-Tuning , LoRA) for python で LLM ( rinna/japanese-gpt-neox-3.6b-instruction-ppo )の finetune - end0tknr's kipple - web写経開発

先程の上記entryの続きで、以下のurlを再び写経します。

note.com

出力結果は示しませんが、以下のような感じでした

  • model.generate() の処理時間は、数秒程度
  • temperature=0~1の設定で、回答の無邪気さが異なる
  • 以下のwarningが表示されましたが、原因は未調査
The attention mask and the pad token id were not set.
As a consequence, you may observe unexpected behavior.
Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:3 for open-end generation.

python script

# -*- coding: utf-8 -*-
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "rinna/japanese-gpt-neox-3.6b-instruction-ppo"
peft_name  = "lorappo-rinna-3.6b"
output_dir = "lorappo-rinna-3.6b-results"

def main():

    model = AutoModelForCausalLM.from_pretrained(model_name,
                                                 load_in_8bit=True,
                                                 device_map="auto")
    tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
    model     = PeftModel.from_pretrained(model,
                                          peft_name,
                                          # device_map="auto"
                                          )
    model.eval() # 評価モード
    
    # text生成
    for ask_part in ["構造・基礎","屋根","バルコニー","玄関","外壁",
                     "開口","内装","収納・階段","キッチン","浴室・トイレ",
                     "太陽光発電","蓄電池",
                     "外観デザイン","内観デザイン","プラン",
                     "耐震・強さ","温熱・住み心地","遮音・採光"]:
        ask_txt = '最近の新築マンションの %s は?' % (ask_part)
        
        # テキスト生成
        ans_txt = generate(tokenizer,model,ask_txt)
        
        print( "Q:", ask_txt.strip() )
        print( "A:", ans_txt.strip().replace("< NL >","") )
        print( "" )


def generate_prompt(data_point):
    if data_point["input"]:
        result = f"""### 指示:
{data_point["instruction"]}

### 入力:
{data_point["input"]}

### 回答:
"""
    else:
        result = f"""### 指示:
{data_point["instruction"]}

### 回答:
"""
    result = result.replace('\n', '<NL>')
    return result


# text生成関数
def generate(tokenizer,model, instruction, input=None, maxTokens=256) -> str:
    # 推論
    prompt = generate_prompt({'instruction': instruction, 'input': input})
    input_ids = tokenizer(prompt,
                          return_tensors="pt",
                          truncation=True,
                          add_special_tokens=False).input_ids.cuda()
    outputs = model.generate(input_ids=input_ids,
                             max_new_tokens=maxTokens,
                             do_sample=True,
                             temperature=0.1, # 回答の無邪気さ 0~1
                             top_p=0.75,
                             top_k=40,
                             no_repeat_ngram_size=2 )
    outputs = outputs[0].tolist()

    # EOSトークンへのヒットでデコード終
    if tokenizer.eos_token_id in outputs:
        eos_index = outputs.index(tokenizer.eos_token_id)
        decoded = tokenizer.decode(outputs[:eos_index])

        # レスポンス内容のみ抽出
        sentinel = "### 回答:"
        sentinelLoc = decoded.find(sentinel)
        if sentinelLoc >= 0:
            result = decoded[sentinelLoc + len(sentinel):]
            return result.replace("<NL>", "\n")
        return 'Warning: Expected prompt template to be emitted. Ignoring output.'
    return 'Warning: no <eos> detected ignoring output'


if __name__ == '__main__':
    main()
(mycuda) C:\Users\end0t\tmp\ENQ>python inference.py

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin C:\Users\end0t\miniconda3\envs\mycuda\lib\site-packages\bitsandbytes\libbitsandbytes_cuda112.dll
C:\Users\end0t\miniconda3\envs\mycuda\lib\site-packages\bitsandbytes\cuda_setup\main.py:156: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('C:/Users/end0t/miniconda3/envs/mycuda/bin')}
  warn(msg)
C:\Users\end0t\miniconda3\envs\mycuda\lib\site-packages\bitsandbytes\cuda_setup\main.py:156: UserWarning: C:\Users\end0t\miniconda3\envs\mycuda did not contain ['cudart64_110.dll', 'cudart64_120.dll', 'cudart64_12.dll'] as expected! Searching further paths...
  warn(msg)
CUDA SETUP: CUDA runtime path found: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudart64_110.dll
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 112
CUDA SETUP: Loading binary C:\Users\end0t\miniconda3\envs\mycuda\lib\site-packages\bitsandbytes\libbitsandbytes_cuda112.dll...
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:3 for open-end generation.
Q: 最近の新築マンションの 構造・基礎 は?
A: ...