214 lines
6.5 KiB
Markdown
214 lines
6.5 KiB
Markdown
# AI Trainer
|
|
|
|
A Python application for training various unsloth models using data from GitHub repositories. Supports both Qwen2.5-Coder and Qwen3 models optimized for RTX3070 8GB VRAM.
|
|
|
|
## Supported Models
|
|
|
|
### 1. Qwen2.5-Coder-7B-Instruct (Default)
|
|
- **Model**: `unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit`
|
|
- **Best for**: Code generation, code completion, programming tasks
|
|
- **Memory Usage**: Moderate (~6-7GB VRAM)
|
|
- **Config**: `configs/training_config.yaml`
|
|
|
|
### 2. Qwen3-8B
|
|
- **Model**: `unsloth/Qwen3-8B-bnb-4bit`
|
|
- **Best for**: General instruction following, broader language tasks
|
|
- **Memory Usage**: Higher (~7-8GB VRAM)
|
|
- **Config**: `configs/training_config_qwen3.yaml`
|
|
|
|
## Features
|
|
|
|
- **Dataset Processing**: Automatically processes code from GitHub repositories
|
|
- **Memory Optimized**: Designed for RTX3070 8GB VRAM with no CPU offloading
|
|
- **Configurable Training**: YAML-based configuration system
|
|
- **Progress Logging**: Comprehensive logging and monitoring
|
|
- **Modular Design**: Clean separation of concerns with dataset processing, training, and utilities
|
|
- **Multi-Model Support**: Easy switching between different model architectures
|
|
|
|
## Requirements
|
|
|
|
- Python 3.8+
|
|
- CUDA-compatible GPU (tested with RTX3070 8GB VRAM)
|
|
- Git
|
|
- Dependencies listed in `requirements.txt`
|
|
|
|
## Private Repository Support
|
|
|
|
The application now supports processing private GitHub repositories by using a GitHub token for authentication.
|
|
To use this feature:
|
|
|
|
1. Generate a GitHub personal access token with appropriate permissions
|
|
2. Pass the token using the `--github_token` command line argument
|
|
3. Use private repository URLs in the same format as public repositories
|
|
|
|
Supported URL formats for private repositories:
|
|
- `https://github.com/user/private-repo.git`
|
|
- `github.com/user/private-repo`
|
|
- `user/private-repo`
|
|
|
|
## Installation
|
|
|
|
1. Clone this repository
|
|
2. if have CUDA GPU install PyTorch:
|
|
```bash
|
|
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu129
|
|
```
|
|
2. Install dependencies:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Training Qwen2.5-Coder-7B (Default)
|
|
```bash
|
|
# Using the main script
|
|
python src/main.py \
|
|
--repo1 https://github.com/user/repo1 \
|
|
--repo2 https://github.com/user/repo2 \
|
|
--config configs/training_config.yaml \
|
|
--output_dir ./models \
|
|
--log_level INFO
|
|
|
|
# Or using the runner script
|
|
python run_training.py \
|
|
--repo1 https://github.com/user/repo1 \
|
|
--repo2 https://github.com/user/repo2
|
|
|
|
# Using private repositories with a GitHub token
|
|
python run_training.py \
|
|
--repo1 https://github.com/user/private-repo1 \
|
|
--repo2 https://github.com/user/private-repo2 \
|
|
--github_token YOUR_GITHUB_TOKEN
|
|
```
|
|
|
|
### Training Qwen3-8B
|
|
```bash
|
|
# Using the main script with Qwen3 config
|
|
python src/main.py \
|
|
--repo1 https://github.com/user/repo1 \
|
|
--repo2 https://github.com/user/repo2 \
|
|
--config configs/training_config_qwen3.yaml \
|
|
--output_dir ./models \
|
|
--log_level INFO
|
|
|
|
# Or using the dedicated Qwen3 runner
|
|
python run_training_qwen3.py \
|
|
--repo1 https://github.com/user/repo1 \
|
|
--repo2 https://github.com/user/repo2
|
|
|
|
# Using private repositories with a GitHub token
|
|
python run_training_qwen3.py \
|
|
--repo1 https://github.com/user/private-repo1 \
|
|
--repo2 https://github.com/user/private-repo2 \
|
|
--github_token YOUR_GITHUB_TOKEN
|
|
```
|
|
|
|
### Command Line Arguments
|
|
|
|
- `--repo1`: First GitHub repository URL (required)
|
|
- `--repo2`: Second GitHub repository URL (required)
|
|
- `--config`: Path to training configuration file (default: configs/training_config.yaml)
|
|
- `--output_dir`: Directory to save trained model (default: ./models)
|
|
- `--log_level`: Logging level (DEBUG, INFO, WARNING, ERROR)
|
|
- `--github_token`: GitHub token for accessing private repositories (optional)
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
ai_trainer/
|
|
├── src/
|
|
│ ├── __init__.py
|
|
│ ├── main.py # Main entry point
|
|
│ ├── trainer.py # Model training logic
|
|
│ ├── dataset_processor.py # GitHub repository processing
|
|
│ ├── config.py # Configuration management
|
|
│ └── utils.py # Utility functions
|
|
├── configs/
|
|
│ └── training_config.yaml # Training configuration
|
|
├── data/
|
|
│ └── processed/ # Processed datasets
|
|
├── models/ # Trained models
|
|
├── logs/ # Training logs
|
|
├── requirements.txt
|
|
└── README.md
|
|
```
|
|
|
|
## Memory Optimization
|
|
|
|
This application is specifically optimized for RTX3070 8GB VRAM:
|
|
- Uses 4-bit quantization (bnb-4bit)
|
|
- Gradient checkpointing enabled
|
|
- No CPU offloading
|
|
- Optimized batch sizes for 8GB VRAM
|
|
- Memory-efficient data loading
|
|
|
|
## Configuration
|
|
|
|
### Qwen2.5-Coder-7B Configuration
|
|
**File**: `configs/training_config.yaml`
|
|
|
|
```yaml
|
|
model:
|
|
name: "unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit"
|
|
max_seq_length: 2048
|
|
|
|
training:
|
|
per_device_train_batch_size: 2
|
|
gradient_accumulation_steps: 4
|
|
learning_rate: 2.0e-4
|
|
num_train_epochs: 3
|
|
|
|
memory:
|
|
use_gradient_checkpointing: true
|
|
offload_to_cpu: false
|
|
max_memory_usage: 0.85
|
|
```
|
|
|
|
### Qwen3-8B Configuration
|
|
**File**: `configs/training_config_qwen3.yaml`
|
|
|
|
```yaml
|
|
model:
|
|
name: "unsloth/Qwen3-8B-bnb-4bit"
|
|
max_seq_length: 2048
|
|
|
|
training:
|
|
per_device_train_batch_size: 1 # More conservative
|
|
gradient_accumulation_steps: 8 # Higher accumulation
|
|
learning_rate: 1.0e-4 # Lower learning rate
|
|
num_train_epochs: 3
|
|
|
|
memory:
|
|
use_gradient_checkpointing: true
|
|
offload_to_cpu: false
|
|
max_memory_usage: 0.95 # More aggressive memory usage
|
|
```
|
|
|
|
### Key Differences
|
|
|
|
| Setting | Qwen2.5-Coder | Qwen3-8B | Reason |
|
|
|---------|---------------|----------|---------|
|
|
| Batch Size | 2 | 1 | Larger model needs smaller batches |
|
|
| Gradient Accumulation | 4 | 8 | Maintains effective batch size |
|
|
| Learning Rate | 2e-4 | 1e-4 | Larger model needs more conservative LR |
|
|
| Memory Usage | 85% | 95% | Qwen3 can use more VRAM |
|
|
| Effective Batch Size | 8 | 8 | Same training dynamics |
|
|
|
|
## Model Selection Guide
|
|
|
|
### Choose Qwen2.5-Coder-7B when:
|
|
- You want to fine-tune specifically for **code generation** tasks
|
|
- Working with **programming languages** and technical content
|
|
- Need **code completion** and **code understanding** capabilities
|
|
- Prefer **moderate memory usage** (~6-7GB VRAM)
|
|
|
|
### Choose Qwen3-8B when:
|
|
- You need **general instruction following** capabilities
|
|
- Working with **mixed content** (code + natural language)
|
|
- Want **broader language understanding** and generation
|
|
- Have **sufficient VRAM** (~7-8GB) and prefer newer architecture
|
|
|
|
## License
|
|
|
|
MIT License |