6.5 KiB
6.5 KiB
AI Trainer
A Python application for training various unsloth models using data from GitHub repositories. Supports both Qwen2.5-Coder and Qwen3 models optimized for RTX3070 8GB VRAM.
Supported Models
1. Qwen2.5-Coder-7B-Instruct (Default)
- Model:
unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit - Best for: Code generation, code completion, programming tasks
- Memory Usage: Moderate (~6-7GB VRAM)
- Config:
configs/training_config.yaml
2. Qwen3-8B
- Model:
unsloth/Qwen3-8B-bnb-4bit - Best for: General instruction following, broader language tasks
- Memory Usage: Higher (~7-8GB VRAM)
- Config:
configs/training_config_qwen3.yaml
Features
- Dataset Processing: Automatically processes code from GitHub repositories
- Memory Optimized: Designed for RTX3070 8GB VRAM with no CPU offloading
- Configurable Training: YAML-based configuration system
- Progress Logging: Comprehensive logging and monitoring
- Modular Design: Clean separation of concerns with dataset processing, training, and utilities
- Multi-Model Support: Easy switching between different model architectures
Requirements
- Python 3.8+
- CUDA-compatible GPU (tested with RTX3070 8GB VRAM)
- Git
- Dependencies listed in
requirements.txt
Private Repository Support
The application now supports processing private GitHub repositories by using a GitHub token for authentication. To use this feature:
- Generate a GitHub personal access token with appropriate permissions
- Pass the token using the
--github_tokencommand line argument - Use private repository URLs in the same format as public repositories
Supported URL formats for private repositories:
https://github.com/user/private-repo.gitgithub.com/user/private-repouser/private-repo
Installation
- Clone this repository
- if have CUDA GPU install PyTorch:
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu129 - Install dependencies:
pip install -r requirements.txt
Usage
Training Qwen2.5-Coder-7B (Default)
# Using the main script
python src/main.py \
--repo1 https://github.com/user/repo1 \
--repo2 https://github.com/user/repo2 \
--config configs/training_config.yaml \
--output_dir ./models \
--log_level INFO
# Or using the runner script
python run_training.py \
--repo1 https://github.com/user/repo1 \
--repo2 https://github.com/user/repo2
# Using private repositories with a GitHub token
python run_training.py \
--repo1 https://github.com/user/private-repo1 \
--repo2 https://github.com/user/private-repo2 \
--github_token YOUR_GITHUB_TOKEN
Training Qwen3-8B
# Using the main script with Qwen3 config
python src/main.py \
--repo1 https://github.com/user/repo1 \
--repo2 https://github.com/user/repo2 \
--config configs/training_config_qwen3.yaml \
--output_dir ./models \
--log_level INFO
# Or using the dedicated Qwen3 runner
python run_training_qwen3.py \
--repo1 https://github.com/user/repo1 \
--repo2 https://github.com/user/repo2
# Using private repositories with a GitHub token
python run_training_qwen3.py \
--repo1 https://github.com/user/private-repo1 \
--repo2 https://github.com/user/private-repo2 \
--github_token YOUR_GITHUB_TOKEN
Command Line Arguments
--repo1: First GitHub repository URL (required)--repo2: Second GitHub repository URL (required)--config: Path to training configuration file (default: configs/training_config.yaml)--output_dir: Directory to save trained model (default: ./models)--log_level: Logging level (DEBUG, INFO, WARNING, ERROR)--github_token: GitHub token for accessing private repositories (optional)
Project Structure
ai_trainer/
├── src/
│ ├── __init__.py
│ ├── main.py # Main entry point
│ ├── trainer.py # Model training logic
│ ├── dataset_processor.py # GitHub repository processing
│ ├── config.py # Configuration management
│ └── utils.py # Utility functions
├── configs/
│ └── training_config.yaml # Training configuration
├── data/
│ └── processed/ # Processed datasets
├── models/ # Trained models
├── logs/ # Training logs
├── requirements.txt
└── README.md
Memory Optimization
This application is specifically optimized for RTX3070 8GB VRAM:
- Uses 4-bit quantization (bnb-4bit)
- Gradient checkpointing enabled
- No CPU offloading
- Optimized batch sizes for 8GB VRAM
- Memory-efficient data loading
Configuration
Qwen2.5-Coder-7B Configuration
File: configs/training_config.yaml
model:
name: "unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit"
max_seq_length: 2048
training:
per_device_train_batch_size: 2
gradient_accumulation_steps: 4
learning_rate: 2.0e-4
num_train_epochs: 3
memory:
use_gradient_checkpointing: true
offload_to_cpu: false
max_memory_usage: 0.85
Qwen3-8B Configuration
File: configs/training_config_qwen3.yaml
model:
name: "unsloth/Qwen3-8B-bnb-4bit"
max_seq_length: 2048
training:
per_device_train_batch_size: 1 # More conservative
gradient_accumulation_steps: 8 # Higher accumulation
learning_rate: 1.0e-4 # Lower learning rate
num_train_epochs: 3
memory:
use_gradient_checkpointing: true
offload_to_cpu: false
max_memory_usage: 0.95 # More aggressive memory usage
Key Differences
| Setting | Qwen2.5-Coder | Qwen3-8B | Reason |
|---|---|---|---|
| Batch Size | 2 | 1 | Larger model needs smaller batches |
| Gradient Accumulation | 4 | 8 | Maintains effective batch size |
| Learning Rate | 2e-4 | 1e-4 | Larger model needs more conservative LR |
| Memory Usage | 85% | 95% | Qwen3 can use more VRAM |
| Effective Batch Size | 8 | 8 | Same training dynamics |
Model Selection Guide
Choose Qwen2.5-Coder-7B when:
- You want to fine-tune specifically for code generation tasks
- Working with programming languages and technical content
- Need code completion and code understanding capabilities
- Prefer moderate memory usage (~6-7GB VRAM)
Choose Qwen3-8B when:
- You need general instruction following capabilities
- Working with mixed content (code + natural language)
- Want broader language understanding and generation
- Have sufficient VRAM (~7-8GB) and prefer newer architecture
License
MIT License