- What This Project Does
- Quick Start
- Complete Tutorial
- Project Structure
- Training Configuration
- Evaluation Results
- FAQ
- Citation
- Acknowledgments
This project fine-tunes Qwen2.5-Coder-1.5B-Instruct for Chinese sentiment analysis using the freeze training method:
- ๐ฏ Task: Binary sentiment classification (positive/negative)
- ๐ Dataset: ChnSentiCorp (Chinese sentiment corpus)
- ๐ง Method: Freeze training (only train the last 6 layers)
- ๐พ Model Size: 1.5B parameters
- โฑ๏ธ Training Time: 15-30 minutes on T4 GPU
- ๐ Performance: Accuracy improved from 91.6% โ 97.8% (+6.2%)
Freeze training is a parameter-efficient fine-tuning method that:
- โ Freezes most model layers
- โ Only trains the last few layers + embeddings
- โ Reduces training time by 60-70%
- โ Uses 40-50% less GPU memory
- โ Achieves 85-95% of full fine-tuning quality
Perfect for: Limited compute resources, quick experimentation, domain adaptation
Perfect for: Beginners, no local GPU required, free T4 GPU
- Click the Colab badge at the top
- Runtime โ Change runtime type โ GPU (T4)
- Click "Connect" to allocate a T4 GPU runtime
- Run all cells (Runtime โ Run all)
- Wait 30-40 minutes for complete workflow
Requirements: Google account (free)
Perfect for: Experienced users, multiple runs, custom modifications
# Clone repository
git clone https://github.com/IIIIQIIII/MSJ-Factory.git
cd MSJ-Factory
# Install dependencies
pip install -e .[torch,bitsandbytes,vllm]
# Start training
llamafactory-cli train examples/train_freeze/qwen2_5_coder_freeze_3k.yaml
# Evaluate model
python scripts/eval_sentiment_compare.pyRequirements:
- Python 3.10+
- CUDA 11.8+ / 12.1+
- GPU: 16GB+ VRAM (T4, V100, A100, etc.)
- Disk: 10GB free space
What it does: Downloads the complete project code to your environment
!git clone --depth 1 https://github.com/IIIIQIIII/MSJ-Factory.git
%cd MSJ-FactoryExpected output:
Cloning into 'MSJ-Factory'...
remote: Enumerating objects: 368, done.
remote: Counting objects: 100% (368/368), done.
Receiving objects: 100% (368/368), 6.08 MiB | 11.88 MiB/s, done.
Verify installation:
!ls -lh
# You should see: data/, examples/, scripts/, src/, etc.๐ What's in the repository?
data/: Training and test datasetsexamples/: Training configuration filesscripts/: Evaluation and utility scriptssrc/: Core library codecontexts/: Documentation and guides
What it does: Installs PyTorch, Transformers, vLLM, and other required libraries
!pip install -e .[torch,bitsandbytes,vllm]Installation time: 3-5 minutes
Verify installation:
import torch
import vllm
# Check PyTorch
print(f'PyTorch: {torch.__version__}')
print(f'CUDA: {torch.cuda.is_available()}')
# Check vLLM
print(f'vLLM: {vllm.__version__}')Expected output:
PyTorch: 2.5.0+cu121
CUDA: True
vLLM: 0.10.0
๐ Troubleshooting: Installation Issues
Issue 1: CUDA not available
# Install CUDA-enabled PyTorch
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121Issue 2: Out of memory during installation
# Use --no-cache-dir
!pip install --no-cache-dir -e .[torch,bitsandbytes,vllm]Issue 3: vLLM installation fails
# Skip vLLM (optional for training)
!pip install -e .[torch,bitsandbytes]What it does: Fine-tunes Qwen2.5-Coder on 3000 balanced sentiment samples
Configuration file: examples/train_freeze/qwen2_5_coder_freeze_3k.yaml
### Model
model_name_or_path: Qwen/Qwen2.5-Coder-1.5B-Instruct # Base model
trust_remote_code: true
### Method
stage: sft # Supervised fine-tuning
finetuning_type: freeze # Freeze training method
freeze_trainable_layers: 6 # Train last 6 layers
freeze_extra_modules: embed_tokens,norm
### Dataset
dataset: sentiment_balanced_3k # 3000 samples (1500 pos + 1500 neg)
template: qwen
cutoff_len: 720
max_samples: 10000
### Training
per_device_train_batch_size: 1 # Batch size per GPU
gradient_accumulation_steps: 8 # Effective batch size = 1 ร 8 = 8
learning_rate: 2.0e-5
num_train_epochs: 2.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true # Use BF16 precision
### Evaluation
val_size: 0.2 # 20% validation split
eval_strategy: steps
eval_steps: 200
compute_accuracy: true!llamafactory-cli train examples/train_freeze/qwen2_5_coder_freeze_3k.yamlTraining progress:
๐ Starting training...
๐ Total epochs: 2
โฑ๏ธ Estimated time: 15-30 minutes
Epoch 1/2: [โโโโโโโโโโโโโโโโโโโโ] 100% | Loss: 0.1234
Epoch 2/2: [โโโโโโโโโโโโโโโโโโโโ] 100% | Loss: 0.0567
โ
Training completed!
๐ Model saved to: saves/qwen2_5-coder-1.5b/freeze/sft/
| Metric | Value |
|---|---|
| Total Steps | ~375 steps |
| Training Loss | 0.05 - 0.15 |
| Validation Accuracy | 95%+ |
| GPU Memory | ~8-10 GB |
| Training Time | 15-30 min |
๐ Understanding Training Logs
Key metrics to watch:
- Loss: Should decrease from ~0.5 to ~0.05
- Accuracy: Should increase to 95%+
- GPU Memory: Should stay under 12GB on T4
Normal behavior:
- Loss may fluctuate early in training
- Accuracy improves in the second epoch
- Some TensorFlow warnings are normal (can ignore)
Warning signs:
- Loss increasing or staying high (>0.3)
- Accuracy below 90% after training
- CUDA out of memory errors
๐๏ธ Advanced: Customize Training
Train for more epochs (better quality):
num_train_epochs: 3.0 # Change from 2.0 to 3.0Train more layers (more adaptation):
freeze_trainable_layers: 12 # Change from 6 to 12Use larger batch size (if you have more VRAM):
per_device_train_batch_size: 2 # Change from 1 to 2
gradient_accumulation_steps: 4 # Change from 8 to 4Train on different dataset:
dataset: your_dataset_name # Must be registered in data/dataset_info.jsonWhat it does: Compares base model vs fine-tuned model performance
!python scripts/eval_sentiment_compare.py \
--csv_path data/ChnSentiCorp_test.csv \
--base_model Qwen/Qwen2.5-Coder-1.5B-Instruct \
--finetuned_model saves/qwen2_5-coder-1.5b/freeze/sft \
--output_file data/sentiment_comparison_results.jsonEvaluation time: 5-10 minutes
Expected output:
๐ ChnSentiCorp Sentiment Analysis - Pre/Post Fine-tuning Comparison
======================================================================
๐ Evaluating Model: Base Model (Pre-finetuning)
======================================================================
Total Samples: 179
Accuracy: 91.62%
Precision: 98.57%
Recall: 83.13%
F1-Score: 90.20%
======================================================================
๐ Evaluating Model: Fine-tuned Model
======================================================================
Total Samples: 179
Accuracy: 97.77%
Precision: 100.00%
Recall: 95.18%
F1-Score: 97.53%
๐ฏ Performance Comparison
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Metric Pre-FT Post-FT Improve Improve %
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Accuracy 91.62% 97.77% โ 6.15% 6.71%
Precision 98.57% 100.00% โ 1.43% 1.45%
Recall 83.13% 95.18% โ 12.05% 14.50%
F1-Score 90.20% 97.53% โ 7.33% 8.13%
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐พ Results saved to: data/sentiment_comparison_results.json
| Metric | What it Means | Target |
|---|---|---|
| Accuracy | Overall correctness | 95%+ |
| Precision | How many predicted positives are correct | 95%+ |
| Recall | How many actual positives were found | 90%+ |
| F1-Score | Harmonic mean of precision & recall | 95%+ |
Predicted Negative Predicted Positive
Actual Negative TN (91) FP (5)
Actual Positive FN (4) TP (79)
- True Negatives (TN): 91 - Correctly identified negative samples
- False Positives (FP): 5 - Negative samples wrongly classified as positive
- False Negatives (FN): 4 - Positive samples wrongly classified as negative
- True Positives (TP): 79 - Correctly identified positive samples
๐ Quick Test on Custom Text
Create a test script test_sentiment.py:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "saves/qwen2_5-coder-1.5b/freeze/sft"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
text = "่ฟไธช้
ๅบ็ๆๅกๆๅบฆ้ๅธธๅฅฝ๏ผๆฟ้ดไนๅพๅนฒๅ๏ผ" # Positive example
prompt = f"""่ฏทๅฏนไปฅไธไธญๆๆๆฌ่ฟ่กๆ
ๆๅๆ๏ผๅคๆญๅ
ถๆ
ๆๅพๅใ
ไปปๅก่ฏดๆ๏ผ
- ๅๆๆๆฌ่กจ่พพ็ๆดไฝๆ
ๆๆๅบฆ
- ๅคๆญๆฏๆญฃ้ข(1)่ฟๆฏ่ด้ข(0)
ๆๆฌๅ
ๅฎน๏ผ
```sentence
{text}่พๅบๆ ผๅผ๏ผ
{{
"sentiment": 0 or 1
}}
```"""
messages = [{"role": "user", "content": prompt}]
text_input = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text_input], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=256, temperature=0.1)
response = tokenizer.batch_decode(generated_ids[:, model_inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]
print(response) # Output: {"sentiment": 1}What it does: Share your fine-tuned model with the community
Follow these steps to create your HuggingFace access token:
Step 1: Click on your profile icon in the top-right corner
Step 2: Navigate to Settings โ Access Tokens
Step 3: Verify your identity by entering your password
Step 4: Click "+ Create new token"
Step 5: Name your token, select "Write" role, and click "Create token"
Step 6: Copy your access token (starts with hf_)
from huggingface_hub import HfApi, login
# Login
login(token="hf_YOUR_TOKEN_HERE") # Replace with your token
# Upload
api = HfApi()
api.create_repo(repo_id="YourUsername/Qwen2.5-Coder-Sentiment", private=False)
api.upload_folder(
folder_path="saves/qwen2_5-coder-1.5b/freeze/sft",
repo_id="YourUsername/Qwen2.5-Coder-Sentiment",
commit_message="Upload freeze-trained Qwen2.5-Coder for sentiment analysis"
)
print("โ
Model uploaded!")
print("๐ https://huggingface.co/YourUsername/Qwen2.5-Coder-Sentiment")Others can now use your model:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("YourUsername/Qwen2.5-Coder-Sentiment")
tokenizer = AutoTokenizer.from_pretrained("YourUsername/Qwen2.5-Coder-Sentiment")MSJ-Factory/
โโโ data/ # Datasets
โ โโโ ChnSentiCorp_test.csv # Test data (179 samples)
โ โโโ chnsenticorp_train_cleaned_instruct_balanced_3k.jsonl # Training data (3000 samples)
โ โโโ dataset_info.json # Dataset registry
โ
โโโ examples/ # Training configs
โ โโโ train_freeze/
โ โโโ qwen2_5_coder_freeze_3k.yaml # Main training config
โ
โโโ scripts/ # Utility scripts
โ โโโ eval_sentiment_compare.py # Evaluation script
โ โโโ convert_chnsenticorp.py # Data conversion
โ
โโโ contexts/ # Documentation
โ โโโ chnsenticorp-evaluation-guide.md # Complete evaluation guide
โ โโโ chnsenticorp-quick-reference.md # Quick commands
โ โโโ EVALUATION_SYSTEM_SUMMARY.md # System overview
โ
โโโ src/ # Core library
โ โโโ llamafactory/ # LlamaFactory integration
โ
โโโ saves/ # Model outputs (created during training)
โ โโโ qwen2_5-coder-1.5b/freeze/sft/ # Fine-tuned model
โ
โโโ Qwen2_5_Sentiment_Fine_tuning_Tutorial.ipynb # Interactive notebook
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
freeze_trainable_layers: 6
bf16: trueper_device_train_batch_size: 4
gradient_accumulation_steps: 2
freeze_trainable_layers: 12 # Train more layers
bf16: true# Dual GPU
!CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_freeze/qwen2_5_coder_freeze_3k.yaml
# Quad GPU
!CUDA_VISIBLE_DEVICES=0,1,2,3 llamafactory-cli train examples/train_freeze/qwen2_5_coder_freeze_3k.yaml| Parameter | Value | What it Does |
|---|---|---|
freeze_trainable_layers |
6 | Number of layers to train (from the end) |
freeze_extra_modules |
embed_tokens,norm | Additional modules to train |
per_device_train_batch_size |
1 | Samples per GPU per step |
gradient_accumulation_steps |
8 | Accumulate gradients for larger effective batch |
learning_rate |
2.0e-5 | How fast the model learns |
num_train_epochs |
2.0 | Number of times to see the data |
bf16 |
true | Use BFloat16 for faster training |
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Base Model | 91.62% | 98.57% | 83.13% | 90.20% |
| Fine-tuned | 97.77% โฌ๏ธ | 100.00% โฌ๏ธ | 95.18% โฌ๏ธ | 97.53% โฌ๏ธ |
| Improvement | +6.15% | +1.43% | +12.05% | +7.33% |
- โ Better domain adaptation: Model learns Chinese sentiment patterns
- โ Improved recall: Catches more positive cases (83% โ 95%)
- โ Perfect precision: No false positives (98% โ 100%)
- โ Consistent predictions: More reliable on edge cases
| Text | Base Model | Fine-tuned | Correct |
|---|---|---|---|
| ่ฟไธช้ ๅบ้ๅธธๆฃ๏ผ | โ Positive | โ Positive | โ |
| ๆๅกๆๅบฆไธ่ฌ่ฌ | โ Positive | โ Negative | โ |
| ๆฟ้ด่ฟ็ฎๅนฒๅ | โ Negative | โ Positive | โ |
| ไปทๆ ผๅคช่ดตไบไธๅผ | โ Negative | โ Negative | โ |
Q1: How much GPU memory do I need?
Minimum: 16GB (T4, V100)
Recommended: 24GB+ (A100, RTX 3090)
For 16GB GPUs:
- Use
bf16: true - Keep
per_device_train_batch_size: 1 - Increase
gradient_accumulation_stepsif needed
Q2: Can I train without a GPU?
Training on CPU is not recommended due to:
- 50-100x slower than GPU
- Would take 12-24 hours instead of 15-30 minutes
Alternatives:
- Use Google Colab (free T4 GPU)
- Use Kaggle notebooks (free P100 GPU)
- Rent GPU on vast.ai or runpod.io
Q3: How do I use my own dataset?
Step 1: Prepare your data in JSONL format
{"messages": [
{"role": "user", "content": "Your prompt here"},
{"role": "assistant", "content": "Expected response"}
]}Step 2: Register in data/dataset_info.json
{
"your_dataset": {
"file_name": "your_data.jsonl",
"formatting": "sharegpt",
"columns": {"messages": "messages"}
}
}Step 3: Update training config
dataset: your_dataset # Change in YAML fileSee contexts/dataset-formats-guide.md for details.
Q4: Training failed with CUDA OOM error
Solution 1: Reduce batch size
per_device_train_batch_size: 1 # Already at minimum
gradient_accumulation_steps: 16 # Increase this insteadSolution 2: Use CPU offloading (slower but works)
deepspeed: examples/deepspeed/ds_z3_offload_config.jsonSolution 3: Train fewer layers
freeze_trainable_layers: 3 # Reduce from 6 to 3Q5: How do I improve model performance further?
Option 1: Train for more epochs
num_train_epochs: 3.0 # Or 4.0, 5.0Option 2: Train more layers
freeze_trainable_layers: 12 # More adaptationOption 3: Use full fine-tuning (much slower)
finetuning_type: full # Instead of freezeOption 4: Collect more training data
- Current: 3000 samples
- Recommended: 5000-10000 samples for best results
Q6: Can I use this for English sentiment analysis?
Yes! Just:
- Prepare an English sentiment dataset
- Update the prompt template (remove Chinese-specific instructions)
- Register your dataset
- Train with the same config
The model supports multiple languages.
Q7: How do I deploy the model for inference?
Option 1: Python script (for testing)
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("saves/qwen2_5-coder-1.5b/freeze/sft")
tokenizer = AutoTokenizer.from_pretrained("saves/qwen2_5-coder-1.5b/freeze/sft")
# Use model.generate() for inferenceOption 2: vLLM (for production)
!vllm serve saves/qwen2_5-coder-1.5b/freeze/sft --port 8000Option 3: LlamaFactory API
!llamafactory-cli api examples/inference/qwen2_5_coder_sft.yamlSee contexts/chnsenticorp-evaluation-guide.md for deployment guide.
If you use this project in your research, please cite:
@misc{msj-factory-2025,
title={Qwen2.5-Coder Sentiment Analysis Fine-tuning Tutorial},
author={MASHIJIAN},
year={2025},
howpublished={\url{https://github.com/IIIIQIIII/MSJ-Factory}}
}This project is built on top of excellent open-source projects:
- LLaMA-Factory - Efficient fine-tuning framework
- Qwen2.5 - Powerful base models
- Transformers - HuggingFace library
- vLLM - Fast inference engine
Special thanks to:
- Alibaba Cloud for releasing Qwen2.5 models
- HuggingFace for model hosting
- Google Colab for free GPU access
If this tutorial helped you, please consider:
- โญ Star this repository - Helps others discover this project
- ๐ Share - Tell your friends and colleagues
- ๐ Report issues - Help the author improve
- ๐ Contribute - Pull requests are welcome!
๐ Don't forget to star! It means a lot to the author! โญ
Built with โค๏ธ by MASHIJIAN









