Many pretrained large language models are out there for us to use. However, they may not be accurate for our purpose. Thus, the model needs fine tuning.
Since the model is large, the idea is to: make a copy of the existing model, and select a small percentage of trainable features to retrain. With the new copy of the model, train the new copy with your data.
# https://huggingface.co/blog/peft | |
import torch | |
from transformers import AutoModelForSeq2SeqLM | |
from peft import get_peft_model, LoraConfig, TaskType | |
peft_config = LoraConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1) | |
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path) | |
model_name_or_path = "bigscience/mt0-large" | |
tokenizer_name_or_path = "bigscience/mt0-large" | |
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path) | |
model = get_peft_model(model, peft_config) | |
model.print_trainable_parameters() |