Fine-Tuning GPT-2 with PEFT-LoRA¶
Fine-tune GPT-2 on the English Quotes dataset using Parameter-Efficient Fine-Tuning (PEFT) with LoRA adapters.
1. Install Dependencies
Install all required libraries: HuggingFace Datasets, Transformers, PEFT (for LoRA), Accelerate, and BitsAndBytes.
!pip install -q datasets transformers peft accelerate bitsandbytes
2. Basic Imports
Import core libraries: HuggingFace Datasets and Transformers for model/data, PEFT for LoRA, and PyTorch.
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model
import torch
3. Load Dataset & Tokenizer
Load the English Quotes dataset and split it 90/10 into train and validation sets. Load the GPT-2 tokenizer and assign eos_token as pad_token so batching works (GPT-2 has no native pad token).
dataset = load_dataset("Abirate/english_quotes")
dataset_split = dataset["train"].train_test_split(test_size=0.1, seed=42)
train_data = dataset_split["train"]
val_data = dataset_split["test"]
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
4. Tokenize
Tokenize each quote with padding and truncation to 64 tokens. Set labels equal to input_ids — for causal LM, the Trainer internally shifts labels by 1 position to compute next-token prediction loss.
def tokenize(batch):
tokenized = tokenizer(batch["quote"], padding="max_length", truncation=True, max_length=64)
tokenized["labels"] = tokenized["input_ids"].copy()
return tokenized
train_data = train_data.map(tokenize, batched=True)
val_data = val_data.map(tokenize, batched=True)
5. Load Model
Load GPT-2 in FP16 precision to reduce memory usage. device_map="auto" automatically distributes model layers across available GPUs.
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
6. LoRA Config + Training Arguments + Training
LoRA config: Inject low-rank adapter matrices into GPT-2's attention layers (c_attn). r=8 is the rank (size of the adapters), lora_alpha=16 is the scaling factor. get_peft_model freezes all original weights and only trains the LoRA layers.
Training arguments: 5 epochs, batch size 4 with gradient accumulation of 2 (effective batch = 8), FP16 training, learning rate 2e-4.
Trainer: Wires everything together and runs the full training loop.
lora_config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["c_attn"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)
training_args = TrainingArguments(
output_dir="./lora-llm",
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
gradient_accumulation_steps=2,
eval_strategy="steps",
eval_steps=20,
logging_steps=10,
save_steps=50,
learning_rate=2e-4,
num_train_epochs=5,
fp16=True,
report_to="none"
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_data,
eval_dataset=val_data,
tokenizer=tokenizer
)
trainer.train()
7. Save the Model
Save only the LoRA adapter weights (not the full GPT-2 base model) — this produces a tiny checkpoint of just the trained delta weights.
model.save_pretrained("lora-gpt2")
tokenizer.save_pretrained("lora-gpt2")
8. Inference From Saved Model
Reload the base GPT-2 model and attach the saved LoRA adapter using PeftModel.from_pretrained. Wrap in a text-generation pipeline for simple inference.
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained("lora-gpt2")
tokenizer.pad_token = tokenizer.eos_token
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.float16,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "lora-gpt2")
text_gen = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.float16,
device_map="auto"
)
Run inference on sample prompts using temperature sampling (do_sample=True, temperature=0.7) for creative, varied outputs.
prompt = "The secret to happiness is"
outputs = text_gen(prompt, max_new_tokens=70, num_return_sequences=1, do_sample=True, temperature=0.7)
print(outputs[0]["generated_text"])
prompt = "once upon a time"
outputs = text_gen(prompt, max_new_tokens=70, num_return_sequences=1, do_sample=True, temperature=0.7)
print(outputs[0]["generated_text"])