first commit
This commit is contained in:
commit
c73b0d247a
183
README.md
Normal file
183
README.md
Normal file
@ -0,0 +1,183 @@
|
||||
# AI Trainer
|
||||
|
||||
A Python application for training various unsloth models using data from GitHub repositories. Supports both Qwen2.5-Coder and Qwen3 models optimized for RTX3070 8GB VRAM.
|
||||
|
||||
## Supported Models
|
||||
|
||||
### 1. Qwen2.5-Coder-7B-Instruct (Default)
|
||||
- **Model**: `unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit`
|
||||
- **Best for**: Code generation, code completion, programming tasks
|
||||
- **Memory Usage**: Moderate (~6-7GB VRAM)
|
||||
- **Config**: `configs/training_config.yaml`
|
||||
|
||||
### 2. Qwen3-8B
|
||||
- **Model**: `unsloth/Qwen3-8B-bnb-4bit`
|
||||
- **Best for**: General instruction following, broader language tasks
|
||||
- **Memory Usage**: Higher (~7-8GB VRAM)
|
||||
- **Config**: `configs/training_config_qwen3.yaml`
|
||||
|
||||
## Features
|
||||
|
||||
- **Dataset Processing**: Automatically processes code from GitHub repositories
|
||||
- **Memory Optimized**: Designed for RTX3070 8GB VRAM with no CPU offloading
|
||||
- **Configurable Training**: YAML-based configuration system
|
||||
- **Progress Logging**: Comprehensive logging and monitoring
|
||||
- **Modular Design**: Clean separation of concerns with dataset processing, training, and utilities
|
||||
- **Multi-Model Support**: Easy switching between different model architectures
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.8+
|
||||
- CUDA-compatible GPU (tested with RTX3070 8GB VRAM)
|
||||
- Git
|
||||
- Dependencies listed in `requirements.txt`
|
||||
|
||||
## Installation
|
||||
|
||||
1. Clone this repository
|
||||
2. Install dependencies:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Training Qwen2.5-Coder-7B (Default)
|
||||
```bash
|
||||
# Using the main script
|
||||
python src/main.py \
|
||||
--repo1 https://github.com/user/repo1 \
|
||||
--repo2 https://github.com/user/repo2 \
|
||||
--config configs/training_config.yaml \
|
||||
--output_dir ./models \
|
||||
--log_level INFO
|
||||
|
||||
# Or using the runner script
|
||||
python run_training.py \
|
||||
--repo1 https://github.com/user/repo1 \
|
||||
--repo2 https://github.com/user/repo2
|
||||
```
|
||||
|
||||
### Training Qwen3-8B
|
||||
```bash
|
||||
# Using the main script with Qwen3 config
|
||||
python src/main.py \
|
||||
--repo1 https://github.com/user/repo1 \
|
||||
--repo2 https://github.com/user/repo2 \
|
||||
--config configs/training_config_qwen3.yaml \
|
||||
--output_dir ./models \
|
||||
--log_level INFO
|
||||
|
||||
# Or using the dedicated Qwen3 runner
|
||||
python run_training_qwen3.py \
|
||||
--repo1 https://github.com/user/repo1 \
|
||||
--repo2 https://github.com/user/repo2
|
||||
```
|
||||
|
||||
### Command Line Arguments
|
||||
|
||||
- `--repo1`: First GitHub repository URL (required)
|
||||
- `--repo2`: Second GitHub repository URL (required)
|
||||
- `--config`: Path to training configuration file (default: configs/training_config.yaml)
|
||||
- `--output_dir`: Directory to save trained model (default: ./models)
|
||||
- `--log_level`: Logging level (DEBUG, INFO, WARNING, ERROR)
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
ai_trainer/
|
||||
├── src/
|
||||
│ ├── __init__.py
|
||||
│ ├── main.py # Main entry point
|
||||
│ ├── trainer.py # Model training logic
|
||||
│ ├── dataset_processor.py # GitHub repository processing
|
||||
│ ├── config.py # Configuration management
|
||||
│ └── utils.py # Utility functions
|
||||
├── configs/
|
||||
│ └── training_config.yaml # Training configuration
|
||||
├── data/
|
||||
│ └── processed/ # Processed datasets
|
||||
├── models/ # Trained models
|
||||
├── logs/ # Training logs
|
||||
├── requirements.txt
|
||||
└── README.md
|
||||
```
|
||||
|
||||
## Memory Optimization
|
||||
|
||||
This application is specifically optimized for RTX3070 8GB VRAM:
|
||||
- Uses 4-bit quantization (bnb-4bit)
|
||||
- Gradient checkpointing enabled
|
||||
- No CPU offloading
|
||||
- Optimized batch sizes for 8GB VRAM
|
||||
- Memory-efficient data loading
|
||||
|
||||
## Configuration
|
||||
|
||||
### Qwen2.5-Coder-7B Configuration
|
||||
**File**: `configs/training_config.yaml`
|
||||
|
||||
```yaml
|
||||
model:
|
||||
name: "unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit"
|
||||
max_seq_length: 2048
|
||||
|
||||
training:
|
||||
per_device_train_batch_size: 2
|
||||
gradient_accumulation_steps: 4
|
||||
learning_rate: 2.0e-4
|
||||
num_train_epochs: 3
|
||||
|
||||
memory:
|
||||
use_gradient_checkpointing: true
|
||||
offload_to_cpu: false
|
||||
max_memory_usage: 0.85
|
||||
```
|
||||
|
||||
### Qwen3-8B Configuration
|
||||
**File**: `configs/training_config_qwen3.yaml`
|
||||
|
||||
```yaml
|
||||
model:
|
||||
name: "unsloth/Qwen3-8B-bnb-4bit"
|
||||
max_seq_length: 2048
|
||||
|
||||
training:
|
||||
per_device_train_batch_size: 1 # More conservative
|
||||
gradient_accumulation_steps: 8 # Higher accumulation
|
||||
learning_rate: 1.0e-4 # Lower learning rate
|
||||
num_train_epochs: 3
|
||||
|
||||
memory:
|
||||
use_gradient_checkpointing: true
|
||||
offload_to_cpu: false
|
||||
max_memory_usage: 0.95 # More aggressive memory usage
|
||||
```
|
||||
|
||||
### Key Differences
|
||||
|
||||
| Setting | Qwen2.5-Coder | Qwen3-8B | Reason |
|
||||
|---------|---------------|----------|---------|
|
||||
| Batch Size | 2 | 1 | Larger model needs smaller batches |
|
||||
| Gradient Accumulation | 4 | 8 | Maintains effective batch size |
|
||||
| Learning Rate | 2e-4 | 1e-4 | Larger model needs more conservative LR |
|
||||
| Memory Usage | 85% | 95% | Qwen3 can use more VRAM |
|
||||
| Effective Batch Size | 8 | 8 | Same training dynamics |
|
||||
|
||||
## Model Selection Guide
|
||||
|
||||
### Choose Qwen2.5-Coder-7B when:
|
||||
- You want to fine-tune specifically for **code generation** tasks
|
||||
- Working with **programming languages** and technical content
|
||||
- Need **code completion** and **code understanding** capabilities
|
||||
- Prefer **moderate memory usage** (~6-7GB VRAM)
|
||||
|
||||
### Choose Qwen3-8B when:
|
||||
- You need **general instruction following** capabilities
|
||||
- Working with **mixed content** (code + natural language)
|
||||
- Want **broader language understanding** and generation
|
||||
- Have **sufficient VRAM** (~7-8GB) and prefer newer architecture
|
||||
|
||||
## License
|
||||
|
||||
MIT License
|
||||
99
compare_configs.py
Normal file
99
compare_configs.py
Normal file
@ -0,0 +1,99 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Compare training configurations for different models
|
||||
"""
|
||||
|
||||
import yaml
|
||||
from pathlib import Path
|
||||
from colorama import init, Fore, Style
|
||||
|
||||
init(autoreset=True)
|
||||
|
||||
def load_config(config_path):
|
||||
"""Load YAML configuration"""
|
||||
with open(config_path, 'r') as f:
|
||||
return yaml.safe_load(f)
|
||||
|
||||
def compare_configs():
|
||||
"""Compare the two training configurations"""
|
||||
print(f"\n{Fore.CYAN}{'='*80}{Style.RESET_ALL}")
|
||||
print(f"{Fore.CYAN}AI TRAINER - MODEL CONFIGURATION COMPARISON{Style.RESET_ALL}")
|
||||
print(f"{Fore.CYAN}{'='*80}{Style.RESET_ALL}")
|
||||
|
||||
# Load configurations
|
||||
qwen25_config = load_config('configs/training_config.yaml')
|
||||
qwen3_config = load_config('configs/training_config_qwen3.yaml')
|
||||
|
||||
# Model comparison
|
||||
print(f"\n{Fore.GREEN}📊 MODEL COMPARISON{Style.RESET_ALL}")
|
||||
print(f"{'Setting':<25} {'Qwen2.5-Coder-7B':<20} {'Qwen3-8B':<15}")
|
||||
print(f"{'-'*60}")
|
||||
|
||||
print(f"{'Model Name':<25} {qwen25_config['model']['name']:<20} {qwen3_config['model']['name']:<15}")
|
||||
print(f"{'Max Seq Length':<25} {qwen25_config['model']['max_seq_length']:<20} {qwen3_config['model']['max_seq_length']:<15}")
|
||||
|
||||
# Training comparison
|
||||
print(f"\n{Fore.GREEN}⚙️ TRAINING PARAMETERS{Style.RESET_ALL}")
|
||||
print(f"{'Parameter':<25} {'Qwen2.5-Coder-7B':<20} {'Qwen3-8B':<15} {'Difference':<15}")
|
||||
print(f"{'-'*75}")
|
||||
|
||||
training_params = [
|
||||
('Batch Size', 'per_device_train_batch_size'),
|
||||
('Gradient Accumulation', 'gradient_accumulation_steps'),
|
||||
('Learning Rate', 'learning_rate'),
|
||||
('Warmup Steps', 'warmup_steps'),
|
||||
('Epochs', 'num_train_epochs')
|
||||
]
|
||||
|
||||
for param_name, param_key in training_params:
|
||||
qwen25_val = qwen25_config['training'][param_key]
|
||||
qwen3_val = qwen3_config['training'][param_key]
|
||||
diff = "🔻" if qwen3_val < qwen25_val else "🔺" if qwen3_val > qwen25_val else "➡️"
|
||||
|
||||
print(f"{param_name:<25} {qwen25_val:<20} {qwen3_val:<15} {diff}")
|
||||
|
||||
# Memory comparison
|
||||
print(f"\n{Fore.GREEN}🧠 MEMORY SETTINGS{Style.RESET_ALL}")
|
||||
print(f"{'Setting':<25} {'Qwen2.5-Coder-7B':<20} {'Qwen3-8B':<15}")
|
||||
print(f"{'-'*60}")
|
||||
|
||||
memory_params = [
|
||||
('Max Memory Usage', 'max_memory_usage'),
|
||||
('Gradient Checkpointing', 'use_gradient_checkpointing'),
|
||||
('CPU Offloading', 'offload_to_cpu')
|
||||
]
|
||||
|
||||
for param_name, param_key in memory_params:
|
||||
qwen25_val = qwen25_config['memory'][param_key]
|
||||
qwen3_val = qwen3_config['memory'][param_key]
|
||||
print(f"{param_name:<25} {qwen25_val:<20} {qwen3_val:<15}")
|
||||
|
||||
# Usage guide
|
||||
print(f"\n{Fore.YELLOW}💡 RECOMMENDATION GUIDE{Style.RESET_ALL}")
|
||||
print(f"{'='*80}")
|
||||
|
||||
print(f"\n{Fore.BLUE}Use Qwen2.5-Coder-7B when:{Style.RESET_ALL}")
|
||||
print(f" • You want to fine-tune for code generation tasks")
|
||||
print(f" • Working primarily with programming languages")
|
||||
print(f" • Need code completion and understanding")
|
||||
print(f" • Prefer moderate memory usage (~6-7GB VRAM)")
|
||||
|
||||
print(f"\n{Fore.BLUE}Use Qwen3-8B when:{Style.RESET_ALL}")
|
||||
print(f" • You need general instruction following")
|
||||
print(f" • Working with mixed code and natural language")
|
||||
print(f" • Want broader language understanding")
|
||||
print(f" • Have sufficient VRAM (~7-8GB)")
|
||||
|
||||
print(f"\n{Fore.GREEN}🚀 QUICK START COMMANDS{Style.RESET_ALL}")
|
||||
print(f"{'='*80}")
|
||||
|
||||
print(f"\n{Fore.CYAN}For Qwen2.5-Coder-7B:{Style.RESET_ALL}")
|
||||
print(f"python run_training.py --repo1 <repo1> --repo2 <repo2>")
|
||||
|
||||
print(f"\n{Fore.CYAN}For Qwen3-8B:{Style.RESET_ALL}")
|
||||
print(f"python run_training_qwen3.py --repo1 <repo1> --repo2 <repo2>")
|
||||
|
||||
print(f"\n{Fore.CYAN}{'='*80}{Style.RESET_ALL}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
compare_configs()
|
||||
132
configs/training_config.yaml
Normal file
132
configs/training_config.yaml
Normal file
@ -0,0 +1,132 @@
|
||||
# Training configuration optimized for RTX3070 8GB VRAM
|
||||
# AI Trainer for unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit
|
||||
|
||||
model:
|
||||
name: "unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit"
|
||||
max_seq_length: 2048
|
||||
trust_remote_code: true
|
||||
use_fast_tokenizer: true
|
||||
padding_side: "left"
|
||||
truncation_side: "left"
|
||||
|
||||
training:
|
||||
# Memory-optimized batch size for RTX3070 8GB
|
||||
per_device_train_batch_size: 2
|
||||
gradient_accumulation_steps: 4
|
||||
|
||||
# Training parameters
|
||||
num_train_epochs: 3
|
||||
learning_rate: 2.0e-4
|
||||
warmup_steps: 10
|
||||
warmup_ratio: 0.1
|
||||
|
||||
# Logging and saving
|
||||
logging_steps: 1
|
||||
save_steps: 100
|
||||
save_total_limit: 3
|
||||
|
||||
# Evaluation
|
||||
evaluation_strategy: "steps"
|
||||
eval_steps: 100
|
||||
load_best_model_at_end: true
|
||||
metric_for_best_model: "loss"
|
||||
greater_is_better: false
|
||||
|
||||
# Data loading
|
||||
dataloader_num_workers: 2
|
||||
dataloader_pin_memory: true
|
||||
remove_unused_columns: false
|
||||
|
||||
# Memory optimization - CRITICAL for RTX3070 8GB
|
||||
use_gradient_checkpointing: true
|
||||
offload_to_cpu: false # Explicitly no CPU offloading
|
||||
|
||||
# Optimizer settings
|
||||
optim: "adamw_torch"
|
||||
weight_decay: 0.01
|
||||
adam_beta1: 0.9
|
||||
adam_beta2: 0.999
|
||||
adam_epsilon: 1.0e-8
|
||||
max_grad_norm: 1.0
|
||||
|
||||
# Learning rate scheduler
|
||||
lr_scheduler_type: "cosine"
|
||||
|
||||
# Precision - BF16 for better stability on modern GPUs
|
||||
bf16: true
|
||||
fp16: false
|
||||
tf32: true
|
||||
|
||||
# Dataset settings
|
||||
dataset_shuffle: true
|
||||
dataset_seed: 42
|
||||
|
||||
# Output settings
|
||||
output_dir: "./models"
|
||||
logging_dir: "./logs"
|
||||
report_to: ["tensorboard"]
|
||||
|
||||
dataset:
|
||||
# File filtering
|
||||
min_file_size: 10
|
||||
max_file_size: 10000
|
||||
|
||||
# Supported programming languages
|
||||
supported_languages:
|
||||
- python
|
||||
- javascript
|
||||
- typescript
|
||||
- java
|
||||
- cpp
|
||||
- c
|
||||
- csharp
|
||||
- php
|
||||
- ruby
|
||||
- go
|
||||
- rust
|
||||
- swift
|
||||
- kotlin
|
||||
- scala
|
||||
- sql
|
||||
- bash
|
||||
- yaml
|
||||
- json
|
||||
- xml
|
||||
- html
|
||||
- css
|
||||
- markdown
|
||||
|
||||
# Files and directories to exclude
|
||||
exclude_patterns:
|
||||
- "\\.git/"
|
||||
- "__pycache__/"
|
||||
- "\\.pytest_cache/"
|
||||
- "node_modules/"
|
||||
- "\\.venv/"
|
||||
- "venv/"
|
||||
- "package-lock\\.json$"
|
||||
- "yarn\\.lock$"
|
||||
- "\\.log$"
|
||||
- "\\.tmp$"
|
||||
- "\\.bak$"
|
||||
- "~\\$.*"
|
||||
- "\\.swp$"
|
||||
- "\\.swo$"
|
||||
- "\\.DS_Store"
|
||||
- "\\.pyc$"
|
||||
- "\\.pyo$"
|
||||
- "\\.pyd$"
|
||||
- "\\.so$"
|
||||
- "\\.dll$"
|
||||
- "\\.exe$"
|
||||
|
||||
memory:
|
||||
# Memory management for RTX3070 8GB
|
||||
max_memory_usage: 0.85 # Use up to 85% of GPU memory
|
||||
enable_memory_tracking: true
|
||||
clear_cache_between_epochs: true
|
||||
|
||||
# Attention optimization
|
||||
use_memory_efficient_attention: true
|
||||
attention_slicing: true
|
||||
slice_size: 1
|
||||
132
configs/training_config_qwen3.yaml
Normal file
132
configs/training_config_qwen3.yaml
Normal file
@ -0,0 +1,132 @@
|
||||
# Training configuration optimized for RTX3070 8GB VRAM - Qwen3-8B Model
|
||||
# AI Trainer for unsloth/Qwen3-8B-bnb-4bit
|
||||
|
||||
model:
|
||||
name: "unsloth/Qwen3-8B-bnb-4bit"
|
||||
max_seq_length: 2048
|
||||
trust_remote_code: true
|
||||
use_fast_tokenizer: true
|
||||
padding_side: "left"
|
||||
truncation_side: "left"
|
||||
|
||||
training:
|
||||
# Memory-optimized batch size for RTX3070 8GB with Qwen3-8B
|
||||
per_device_train_batch_size: 1 # More conservative for larger model
|
||||
gradient_accumulation_steps: 8 # Higher accumulation to maintain effective batch size
|
||||
|
||||
# Training parameters
|
||||
num_train_epochs: 3
|
||||
learning_rate: 1.0e-4 # Slightly lower for larger model
|
||||
warmup_steps: 15
|
||||
warmup_ratio: 0.1
|
||||
|
||||
# Logging and saving
|
||||
logging_steps: 1
|
||||
save_steps: 100
|
||||
save_total_limit: 3
|
||||
|
||||
# Evaluation
|
||||
evaluation_strategy: "steps"
|
||||
eval_steps: 100
|
||||
load_best_model_at_end: true
|
||||
metric_for_best_model: "loss"
|
||||
greater_is_better: false
|
||||
|
||||
# Data loading
|
||||
dataloader_num_workers: 2
|
||||
dataloader_pin_memory: true
|
||||
remove_unused_columns: false
|
||||
|
||||
# Memory optimization - CRITICAL for RTX3070 8GB with 8B model
|
||||
use_gradient_checkpointing: true
|
||||
offload_to_cpu: false # Explicitly no CPU offloading
|
||||
|
||||
# Optimizer settings
|
||||
optim: "adamw_torch"
|
||||
weight_decay: 0.01
|
||||
adam_beta1: 0.9
|
||||
adam_beta2: 0.999
|
||||
adam_epsilon: 1.0e-8
|
||||
max_grad_norm: 1.0
|
||||
|
||||
# Learning rate scheduler
|
||||
lr_scheduler_type: "cosine"
|
||||
|
||||
# Precision - BF16 for better stability on modern GPUs
|
||||
bf16: true
|
||||
fp16: false
|
||||
tf32: true
|
||||
|
||||
# Dataset settings
|
||||
dataset_shuffle: true
|
||||
dataset_seed: 42
|
||||
|
||||
# Output settings
|
||||
output_dir: "./models"
|
||||
logging_dir: "./logs"
|
||||
report_to: ["tensorboard"]
|
||||
|
||||
dataset:
|
||||
# File filtering
|
||||
min_file_size: 10
|
||||
max_file_size: 10000
|
||||
|
||||
# Supported programming languages
|
||||
supported_languages:
|
||||
- python
|
||||
- javascript
|
||||
- typescript
|
||||
- java
|
||||
- cpp
|
||||
- c
|
||||
- csharp
|
||||
- php
|
||||
- ruby
|
||||
- go
|
||||
- rust
|
||||
- swift
|
||||
- kotlin
|
||||
- scala
|
||||
- sql
|
||||
- bash
|
||||
- yaml
|
||||
- json
|
||||
- xml
|
||||
- html
|
||||
- css
|
||||
- markdown
|
||||
|
||||
# Files and directories to exclude
|
||||
exclude_patterns:
|
||||
- "\\.git/"
|
||||
- "__pycache__/"
|
||||
- "\\.pytest_cache/"
|
||||
- "node_modules/"
|
||||
- "\\.venv/"
|
||||
- "venv/"
|
||||
- "package-lock\\.json$"
|
||||
- "yarn\\.lock$"
|
||||
- "\\.log$"
|
||||
- "\\.tmp$"
|
||||
- "\\.bak$"
|
||||
- "~\\$.*"
|
||||
- "\\.swp$"
|
||||
- "\\.swo$"
|
||||
- "\\.DS_Store"
|
||||
- "\\.pyc$"
|
||||
- "\\.pyo$"
|
||||
- "\\.pyd$"
|
||||
- "\\.so$"
|
||||
- "\\.dll$"
|
||||
- "\\.exe$"
|
||||
|
||||
memory:
|
||||
# Memory management for RTX3070 8GB with Qwen3-8B
|
||||
max_memory_usage: 0.95 # Use up to 95% for more aggressive memory usage
|
||||
enable_memory_tracking: true
|
||||
clear_cache_between_epochs: true
|
||||
|
||||
# Attention optimization
|
||||
use_memory_efficient_attention: true
|
||||
attention_slicing: true
|
||||
slice_size: 1
|
||||
55
requirements.txt
Normal file
55
requirements.txt
Normal file
@ -0,0 +1,55 @@
|
||||
# Core ML libraries
|
||||
torch>=2.1.0
|
||||
torchvision>=0.16.0
|
||||
torchaudio>=2.1.0
|
||||
|
||||
# Unsloth for efficient model training
|
||||
unsloth[cu121]>=2024.5
|
||||
unsloth_zoo>=2024.5
|
||||
|
||||
# Transformers and tokenizers
|
||||
transformers>=4.38.0
|
||||
tokenizers>=0.15.0
|
||||
sentencepiece>=0.1.99
|
||||
|
||||
# Datasets and data processing
|
||||
datasets>=2.18.0
|
||||
pandas>=2.0.0
|
||||
numpy>=1.24.0
|
||||
|
||||
# Git and repository handling
|
||||
GitPython>=3.1.0
|
||||
requests>=2.31.0
|
||||
|
||||
# Configuration and utilities
|
||||
PyYAML>=6.0.0
|
||||
tqdm>=4.65.0
|
||||
colorama>=0.4.6
|
||||
python-dotenv>=1.0.0
|
||||
|
||||
# Memory optimization
|
||||
bitsandbytes>=0.43.0
|
||||
accelerate>=0.27.0
|
||||
|
||||
# Logging and monitoring
|
||||
tensorboard>=2.14.0
|
||||
wandb>=0.16.0
|
||||
|
||||
# Code processing
|
||||
tree-sitter>=0.20.0
|
||||
tree-sitter-python>=0.20.0
|
||||
tree-sitter-javascript>=0.20.0
|
||||
tree-sitter-typescript>=0.20.0
|
||||
tree-sitter-java>=0.20.0
|
||||
tree-sitter-go>=0.20.0
|
||||
tree-sitter-rust>=0.20.0
|
||||
|
||||
# Optional: for model quantization and optimization
|
||||
optimum>=1.17.0
|
||||
auto-gptq>=0.6.0
|
||||
|
||||
# Development and testing
|
||||
pytest>=7.4.0
|
||||
black>=23.0.0
|
||||
isort>=5.12.0
|
||||
flake8>=6.0.0
|
||||
22
run_training.py
Normal file
22
run_training.py
Normal file
@ -0,0 +1,22 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Simple training runner script for AI Trainer
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Add src to path
|
||||
sys.path.append(str(Path(__file__).parent / "src"))
|
||||
|
||||
from main import main
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Set environment variables for better CUDA performance
|
||||
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
|
||||
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:512'
|
||||
os.environ['TOKENIZERS_PARALLELISM'] = 'false'
|
||||
|
||||
# Run the main training application
|
||||
main()
|
||||
31
run_training_qwen3.py
Normal file
31
run_training_qwen3.py
Normal file
@ -0,0 +1,31 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Training runner script for unsloth/Qwen3-8B-bnb-4bit model
|
||||
Optimized for RTX3070 8GB VRAM
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Add src to path
|
||||
sys.path.append(str(Path(__file__).parent / "src"))
|
||||
|
||||
from main import main
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Set environment variables for better CUDA performance with Qwen3
|
||||
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
|
||||
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:512'
|
||||
os.environ['TOKENIZERS_PARALLELISM'] = 'false'
|
||||
|
||||
# Use Qwen3 configuration by default
|
||||
if '--config' not in sys.argv:
|
||||
sys.argv.extend(['--config', 'configs/training_config_qwen3.yaml'])
|
||||
|
||||
print("🚀 Starting training with unsloth/Qwen3-8B-bnb-4bit model")
|
||||
print("📊 Configuration: configs/training_config_qwen3.yaml")
|
||||
print("🧠 Memory optimization: RTX3070 8GB mode")
|
||||
|
||||
# Run the main training application
|
||||
main()
|
||||
6
src/__init__.py
Normal file
6
src/__init__.py
Normal file
@ -0,0 +1,6 @@
|
||||
"""
|
||||
AI Trainer - Training framework for unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit model
|
||||
"""
|
||||
|
||||
__version__ = "1.0.0"
|
||||
__author__ = "AI Trainer"
|
||||
227
src/config.py
Normal file
227
src/config.py
Normal file
@ -0,0 +1,227 @@
|
||||
"""
|
||||
Configuration management for AI Trainer
|
||||
Handles training parameters and model settings
|
||||
"""
|
||||
|
||||
import os
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional, Union
|
||||
|
||||
import yaml
|
||||
|
||||
|
||||
@dataclass
|
||||
class ModelConfig:
|
||||
"""Model-specific configuration"""
|
||||
name: str = "unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit"
|
||||
max_seq_length: int = 2048
|
||||
trust_remote_code: bool = True
|
||||
use_fast_tokenizer: bool = True
|
||||
padding_side: str = "left"
|
||||
truncation_side: str = "left"
|
||||
|
||||
|
||||
@dataclass
|
||||
class TrainingConfig:
|
||||
"""Training configuration"""
|
||||
per_device_train_batch_size: int = 2
|
||||
gradient_accumulation_steps: int = 4
|
||||
num_train_epochs: int = 3
|
||||
learning_rate: float = 2e-4
|
||||
warmup_steps: int = 10
|
||||
logging_steps: int = 1
|
||||
save_steps: int = 100
|
||||
save_total_limit: int = 3
|
||||
evaluation_strategy: str = "steps"
|
||||
eval_steps: int = 100
|
||||
load_best_model_at_end: bool = True
|
||||
metric_for_best_model: str = "loss"
|
||||
greater_is_better: bool = False
|
||||
dataloader_num_workers: int = 2
|
||||
dataloader_pin_memory: bool = True
|
||||
remove_unused_columns: bool = False
|
||||
label_names: List[str] = None
|
||||
|
||||
# Memory optimization for RTX3070 8GB
|
||||
use_gradient_checkpointing: bool = True
|
||||
offload_to_cpu: bool = False # Explicitly no CPU offloading
|
||||
use_reentrant: bool = True
|
||||
gradient_checkpointing_kwargs: Dict = None
|
||||
|
||||
# Optimizer settings
|
||||
optim: str = "adamw_torch"
|
||||
weight_decay: float = 0.01
|
||||
adam_beta1: float = 0.9
|
||||
adam_beta2: float = 0.999
|
||||
adam_epsilon: float = 1e-8
|
||||
max_grad_norm: float = 1.0
|
||||
|
||||
# Learning rate scheduler
|
||||
lr_scheduler_type: str = "cosine"
|
||||
warmup_ratio: float = 0.1
|
||||
|
||||
# Precision settings
|
||||
bf16: bool = True
|
||||
fp16: bool = False
|
||||
tf32: bool = True
|
||||
|
||||
# Dataset processing
|
||||
dataset_shuffle: bool = True
|
||||
dataset_seed: int = 42
|
||||
|
||||
# Output settings
|
||||
output_dir: str = "./models"
|
||||
logging_dir: str = "./logs"
|
||||
report_to: List[str] = None
|
||||
|
||||
def __post_init__(self):
|
||||
if self.label_names is None:
|
||||
self.label_names = ["labels"]
|
||||
|
||||
if self.gradient_checkpointing_kwargs is None:
|
||||
self.gradient_checkpointing_kwargs = {"use_reentrant": self.use_reentrant}
|
||||
|
||||
if self.report_to is None:
|
||||
self.report_to = ["tensorboard"]
|
||||
|
||||
|
||||
@dataclass
|
||||
class DatasetConfig:
|
||||
"""Dataset processing configuration"""
|
||||
min_file_size: int = 10
|
||||
max_file_size: int = 10000 # Characters
|
||||
supported_languages: List[str] = None
|
||||
exclude_patterns: List[str] = None
|
||||
|
||||
def __post_init__(self):
|
||||
if self.supported_languages is None:
|
||||
self.supported_languages = [
|
||||
'python', 'javascript', 'typescript', 'java', 'cpp', 'c',
|
||||
'csharp', 'php', 'ruby', 'go', 'rust', 'swift', 'kotlin',
|
||||
'scala', 'sql', 'bash', 'yaml', 'json', 'xml', 'html', 'css'
|
||||
]
|
||||
|
||||
if self.exclude_patterns is None:
|
||||
self.exclude_patterns = [
|
||||
r'\.git/',
|
||||
r'__pycache__/',
|
||||
r'node_modules/',
|
||||
r'\.venv/',
|
||||
r'package-lock\.json$',
|
||||
r'\.log$'
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class MemoryConfig:
|
||||
"""Memory optimization settings for RTX3070 8GB"""
|
||||
max_memory_usage: float = 0.85 # Use up to 85% of GPU memory
|
||||
enable_memory_tracking: bool = True
|
||||
clear_cache_between_epochs: bool = True
|
||||
use_memory_efficient_attention: bool = True
|
||||
attention_slicing: bool = True
|
||||
slice_size: int = 1
|
||||
|
||||
|
||||
@dataclass
|
||||
class AppConfig:
|
||||
"""Main application configuration"""
|
||||
model: ModelConfig
|
||||
training: TrainingConfig
|
||||
dataset: DatasetConfig
|
||||
memory: MemoryConfig
|
||||
|
||||
@classmethod
|
||||
def from_yaml(cls, config_path: Union[str, Path]) -> "AppConfig":
|
||||
"""Load configuration from YAML file"""
|
||||
config_path = Path(config_path)
|
||||
|
||||
if not config_path.exists():
|
||||
# Create default configuration
|
||||
config = cls(
|
||||
model=ModelConfig(),
|
||||
training=TrainingConfig(),
|
||||
dataset=DatasetConfig(),
|
||||
memory=MemoryConfig()
|
||||
)
|
||||
config.save_yaml(config_path)
|
||||
return config
|
||||
|
||||
with open(config_path, 'r', encoding='utf-8') as f:
|
||||
config_dict = yaml.safe_load(f)
|
||||
|
||||
# Parse nested configurations
|
||||
model_config = ModelConfig(**config_dict.get('model', {}))
|
||||
training_config = TrainingConfig(**config_dict.get('training', {}))
|
||||
dataset_config = DatasetConfig(**config_dict.get('dataset', {}))
|
||||
memory_config = MemoryConfig(**config_dict.get('memory', {}))
|
||||
|
||||
return cls(
|
||||
model=model_config,
|
||||
training=training_config,
|
||||
dataset=dataset_config,
|
||||
memory=memory_config
|
||||
)
|
||||
|
||||
def save_yaml(self, config_path: Union[str, Path]):
|
||||
"""Save configuration to YAML file"""
|
||||
config_path = Path(config_path)
|
||||
config_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
config_dict = {
|
||||
'model': {
|
||||
'name': self.model.name,
|
||||
'max_seq_length': self.model.max_seq_length,
|
||||
'trust_remote_code': self.model.trust_remote_code,
|
||||
'use_fast_tokenizer': self.model.use_fast_tokenizer,
|
||||
'padding_side': self.model.padding_side,
|
||||
'truncation_side': self.model.truncation_side
|
||||
},
|
||||
'training': {
|
||||
'per_device_train_batch_size': self.training.per_device_train_batch_size,
|
||||
'gradient_accumulation_steps': self.training.gradient_accumulation_steps,
|
||||
'num_train_epochs': self.training.num_train_epochs,
|
||||
'learning_rate': self.training.learning_rate,
|
||||
'warmup_steps': self.training.warmup_steps,
|
||||
'logging_steps': self.training.logging_steps,
|
||||
'save_steps': self.training.save_steps,
|
||||
'save_total_limit': self.training.save_total_limit,
|
||||
'evaluation_strategy': self.training.evaluation_strategy,
|
||||
'eval_steps': self.training.eval_steps,
|
||||
'load_best_model_at_end': self.training.load_best_model_at_end,
|
||||
'metric_for_best_model': self.training.metric_for_best_model,
|
||||
'greater_is_better': self.training.greater_is_better,
|
||||
'dataloader_num_workers': self.training.dataloader_num_workers,
|
||||
'dataloader_pin_memory': self.training.dataloader_pin_memory,
|
||||
'remove_unused_columns': self.training.remove_unused_columns,
|
||||
'use_gradient_checkpointing': self.training.use_gradient_checkpointing,
|
||||
'offload_to_cpu': self.training.offload_to_cpu,
|
||||
'optim': self.training.optim,
|
||||
'weight_decay': self.training.weight_decay,
|
||||
'lr_scheduler_type': self.training.lr_scheduler_type,
|
||||
'warmup_ratio': self.training.warmup_ratio,
|
||||
'bf16': self.training.bf16,
|
||||
'fp16': self.training.fp16,
|
||||
'tf32': self.training.tf32,
|
||||
'dataset_shuffle': self.training.dataset_shuffle,
|
||||
'dataset_seed': self.training.dataset_seed
|
||||
},
|
||||
'dataset': {
|
||||
'min_file_size': self.dataset.min_file_size,
|
||||
'max_file_size': self.dataset.max_file_size,
|
||||
'supported_languages': self.dataset.supported_languages,
|
||||
'exclude_patterns': self.dataset.exclude_patterns
|
||||
},
|
||||
'memory': {
|
||||
'max_memory_usage': self.memory.max_memory_usage,
|
||||
'enable_memory_tracking': self.memory.enable_memory_tracking,
|
||||
'clear_cache_between_epochs': self.memory.clear_cache_between_epochs,
|
||||
'use_memory_efficient_attention': self.memory.use_memory_efficient_attention,
|
||||
'attention_slicing': self.memory.attention_slicing,
|
||||
'slice_size': self.memory.slice_size
|
||||
}
|
||||
}
|
||||
|
||||
with open(config_path, 'w', encoding='utf-8') as f:
|
||||
yaml.dump(config_dict, f, default_flow_style=False, indent=2)
|
||||
250
src/dataset_processor.py
Normal file
250
src/dataset_processor.py
Normal file
@ -0,0 +1,250 @@
|
||||
"""
|
||||
Dataset processor for GitHub repositories
|
||||
Processes code from GitHub repositories into training datasets
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional, Tuple
|
||||
|
||||
import git
|
||||
from datasets import Dataset
|
||||
from tqdm import tqdm
|
||||
|
||||
from config import TrainingConfig
|
||||
|
||||
|
||||
class DatasetProcessor:
|
||||
"""Processes GitHub repositories into training datasets"""
|
||||
|
||||
# Supported file extensions for code training
|
||||
CODE_EXTENSIONS = {
|
||||
'.py': 'python',
|
||||
'.js': 'javascript',
|
||||
'.ts': 'typescript',
|
||||
'.java': 'java',
|
||||
'.cpp': 'cpp',
|
||||
'.c': 'c',
|
||||
'.h': 'c',
|
||||
'.hpp': 'cpp',
|
||||
'.cs': 'csharp',
|
||||
'.php': 'php',
|
||||
'.rb': 'ruby',
|
||||
'.go': 'go',
|
||||
'.rs': 'rust',
|
||||
'.swift': 'swift',
|
||||
'.kt': 'kotlin',
|
||||
'.scala': 'scala',
|
||||
'.sql': 'sql',
|
||||
'.sh': 'bash',
|
||||
'.yaml': 'yaml',
|
||||
'.yml': 'yaml',
|
||||
'.json': 'json',
|
||||
'.xml': 'xml',
|
||||
'.html': 'html',
|
||||
'.css': 'css',
|
||||
'.md': 'markdown'
|
||||
}
|
||||
|
||||
# Files and directories to exclude
|
||||
EXCLUDE_PATTERNS = [
|
||||
r'\.git/',
|
||||
r'__pycache__/',
|
||||
r'\.pytest_cache/',
|
||||
r'node_modules/',
|
||||
r'\.venv/',
|
||||
r'venv/',
|
||||
r'\.DS_Store',
|
||||
r'\.pyc$',
|
||||
r'\.pyo$',
|
||||
r'\.pyd$',
|
||||
r'\.so$',
|
||||
r'\.dll$',
|
||||
r'\.exe$',
|
||||
r'\.bin$',
|
||||
r'package-lock\.json$',
|
||||
r'yarn\.lock$',
|
||||
r'\.log$',
|
||||
r'\.tmp$',
|
||||
r'\.bak$',
|
||||
r'~\$.*',
|
||||
r'\.swp$',
|
||||
r'\.swo$'
|
||||
]
|
||||
|
||||
def __init__(self):
|
||||
self.logger = logging.getLogger(__name__)
|
||||
self.temp_dirs = []
|
||||
|
||||
def process_github_repos(self, repo_urls: List[str], config: TrainingConfig) -> Dataset:
|
||||
"""
|
||||
Process multiple GitHub repositories into a training dataset
|
||||
|
||||
Args:
|
||||
repo_urls: List of GitHub repository URLs
|
||||
config: Training configuration
|
||||
|
||||
Returns:
|
||||
Dataset ready for training
|
||||
"""
|
||||
all_code_samples = []
|
||||
|
||||
for repo_url in repo_urls:
|
||||
try:
|
||||
self.logger.info(f"Processing repository: {repo_url}")
|
||||
repo_samples = self._process_single_repo(repo_url, config)
|
||||
all_code_samples.extend(repo_samples)
|
||||
self.logger.info(f"Extracted {len(repo_samples)} samples from {repo_url}")
|
||||
except Exception as e:
|
||||
self.logger.error(f"Failed to process repository {repo_url}: {str(e)}")
|
||||
continue
|
||||
|
||||
if not all_code_samples:
|
||||
raise ValueError("No code samples extracted from any repository")
|
||||
|
||||
self.logger.info(f"Total samples collected: {len(all_code_samples)}")
|
||||
|
||||
# Create HuggingFace dataset
|
||||
dataset = Dataset.from_list(all_code_samples)
|
||||
|
||||
# Filter by sequence length
|
||||
dataset = dataset.filter(
|
||||
lambda x: len(x['text'].split()) <= config.model.max_seq_length
|
||||
)
|
||||
|
||||
self.logger.info(f"Dataset size after filtering: {len(dataset)}")
|
||||
return dataset
|
||||
|
||||
def _process_single_repo(self, repo_url: str, config: TrainingConfig) -> List[Dict]:
|
||||
"""
|
||||
Process a single GitHub repository
|
||||
|
||||
Args:
|
||||
repo_url: GitHub repository URL
|
||||
config: Training configuration
|
||||
|
||||
Returns:
|
||||
List of code samples with metadata
|
||||
"""
|
||||
temp_dir = tempfile.mkdtemp()
|
||||
self.temp_dirs.append(temp_dir)
|
||||
|
||||
try:
|
||||
# Clone repository
|
||||
repo_name = repo_url.split('/')[-1].replace('.git', '')
|
||||
repo_path = os.path.join(temp_dir, repo_name)
|
||||
|
||||
self.logger.info(f"Cloning {repo_url} to {repo_path}")
|
||||
repo = git.Repo.clone_from(repo_url, repo_path)
|
||||
|
||||
# Extract code samples
|
||||
code_samples = self._extract_code_samples(repo_path, config)
|
||||
|
||||
return code_samples
|
||||
|
||||
finally:
|
||||
# Cleanup
|
||||
shutil.rmtree(temp_dir, ignore_errors=True)
|
||||
|
||||
def _extract_code_samples(self, repo_path: str, config: TrainingConfig) -> List[Dict]:
|
||||
"""
|
||||
Extract code samples from a repository
|
||||
|
||||
Args:
|
||||
repo_path: Path to cloned repository
|
||||
config: Training configuration
|
||||
|
||||
Returns:
|
||||
List of code samples
|
||||
"""
|
||||
code_samples = []
|
||||
repo_path_obj = Path(repo_path)
|
||||
|
||||
# Find all code files
|
||||
code_files = []
|
||||
for ext in self.CODE_EXTENSIONS:
|
||||
code_files.extend(repo_path_obj.rglob(f'*{ext}'))
|
||||
|
||||
self.logger.info(f"Found {len(code_files)} code files")
|
||||
|
||||
for code_file in tqdm(code_files, desc="Processing code files"):
|
||||
try:
|
||||
if self._should_exclude_file(str(code_file.relative_to(repo_path))):
|
||||
continue
|
||||
|
||||
sample = self._process_code_file(code_file, repo_path_obj, config)
|
||||
if sample:
|
||||
code_samples.append(sample)
|
||||
|
||||
except Exception as e:
|
||||
self.logger.warning(f"Failed to process {code_file}: {str(e)}")
|
||||
continue
|
||||
|
||||
return code_samples
|
||||
|
||||
def _should_exclude_file(self, relative_path: str) -> bool:
|
||||
"""Check if a file should be excluded based on patterns"""
|
||||
for pattern in self.EXCLUDE_PATTERNS:
|
||||
if re.search(pattern, relative_path):
|
||||
return True
|
||||
return False
|
||||
|
||||
def _process_code_file(self, file_path: Path, repo_path: Path, config: TrainingConfig) -> Optional[Dict]:
|
||||
"""
|
||||
Process a single code file into a training sample
|
||||
|
||||
Args:
|
||||
file_path: Path to the code file
|
||||
repo_path: Path to the repository root
|
||||
config: Training configuration
|
||||
|
||||
Returns:
|
||||
Dictionary containing the processed sample or None if invalid
|
||||
"""
|
||||
try:
|
||||
# Read file content
|
||||
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
|
||||
content = f.read()
|
||||
|
||||
# Skip if file is too small or too large
|
||||
if len(content.strip()) < 10:
|
||||
return None
|
||||
if len(content) > config.model.max_seq_length * 4: # Rough character limit
|
||||
return None
|
||||
|
||||
# Get relative path for context
|
||||
relative_path = file_path.relative_to(repo_path)
|
||||
|
||||
# Determine language
|
||||
extension = file_path.suffix.lower()
|
||||
language = self.CODE_EXTENSIONS.get(extension, 'unknown')
|
||||
|
||||
# Create training sample
|
||||
sample = {
|
||||
'text': content,
|
||||
'language': language,
|
||||
'file_path': str(relative_path),
|
||||
'repo_name': repo_path.name,
|
||||
'file_size': len(content),
|
||||
'line_count': len(content.splitlines())
|
||||
}
|
||||
|
||||
return sample
|
||||
|
||||
except Exception as e:
|
||||
self.logger.warning(f"Error processing {file_path}: {str(e)}")
|
||||
return None
|
||||
|
||||
def cleanup(self):
|
||||
"""Clean up temporary directories"""
|
||||
for temp_dir in self.temp_dirs:
|
||||
try:
|
||||
shutil.rmtree(temp_dir, ignore_errors=True)
|
||||
except Exception as e:
|
||||
self.logger.warning(f"Failed to cleanup {temp_dir}: {str(e)}")
|
||||
self.temp_dirs.clear()
|
||||
110
src/main.py
Normal file
110
src/main.py
Normal file
@ -0,0 +1,110 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Main entry point for AI Trainer application
|
||||
Training framework for unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit model
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Add src to path for imports
|
||||
sys.path.append(str(Path(__file__).parent))
|
||||
|
||||
from trainer import ModelTrainer
|
||||
from dataset_processor import DatasetProcessor
|
||||
from config import TrainingConfig
|
||||
from utils import setup_logging, check_gpu_memory
|
||||
|
||||
|
||||
def parse_arguments():
|
||||
"""Parse command line arguments"""
|
||||
parser = argparse.ArgumentParser(description="AI Trainer for Qwen2.5-Coder model")
|
||||
|
||||
parser.add_argument(
|
||||
"--config",
|
||||
type=str,
|
||||
default="configs/training_config.yaml",
|
||||
help="Path to training configuration file"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--repo1",
|
||||
type=str,
|
||||
required=True,
|
||||
help="First GitHub repository URL"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--repo2",
|
||||
type=str,
|
||||
required=True,
|
||||
help="Second GitHub repository URL"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--output_dir",
|
||||
type=str,
|
||||
default="./models",
|
||||
help="Directory to save trained model"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--log_level",
|
||||
type=str,
|
||||
default="INFO",
|
||||
choices=["DEBUG", "INFO", "WARNING", "ERROR"],
|
||||
help="Logging level"
|
||||
)
|
||||
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def main():
|
||||
"""Main application entry point"""
|
||||
args = parse_arguments()
|
||||
|
||||
# Setup logging
|
||||
setup_logging(args.log_level)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
logger.info("Starting AI Trainer for Qwen2.5-Coder-7B-Instruct-bnb-4bit")
|
||||
logger.info(f"Repository 1: {args.repo1}")
|
||||
logger.info(f"Repository 2: {args.repo2}")
|
||||
|
||||
try:
|
||||
# Check GPU memory
|
||||
gpu_info = check_gpu_memory()
|
||||
logger.info(f"GPU Memory Info: {gpu_info}")
|
||||
|
||||
# Load configuration
|
||||
config = TrainingConfig.from_yaml(args.config)
|
||||
logger.info("Configuration loaded successfully")
|
||||
|
||||
# Process datasets from GitHub repositories
|
||||
dataset_processor = DatasetProcessor()
|
||||
logger.info("Processing datasets from GitHub repositories...")
|
||||
|
||||
train_dataset = dataset_processor.process_github_repos(
|
||||
repo_urls=[args.repo1, args.repo2],
|
||||
config=config
|
||||
)
|
||||
|
||||
logger.info(f"Dataset processed successfully. Size: {len(train_dataset)}")
|
||||
|
||||
# Initialize and run trainer
|
||||
trainer = ModelTrainer(config=config, output_dir=args.output_dir)
|
||||
logger.info("Starting model training...")
|
||||
|
||||
trained_model_path = trainer.train(train_dataset)
|
||||
logger.info(f"Training completed! Model saved to: {trained_model_path}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Training failed with error: {str(e)}")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
284
src/trainer.py
Normal file
284
src/trainer.py
Normal file
@ -0,0 +1,284 @@
|
||||
"""
|
||||
Model trainer for unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit
|
||||
Optimized for RTX3070 8GB VRAM with no CPU offloading
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
import gc
|
||||
import torch
|
||||
from pathlib import Path
|
||||
from typing import Optional, Dict, Any
|
||||
|
||||
import torch.nn as nn
|
||||
from transformers import (
|
||||
AutoModelForCausalLM,
|
||||
AutoTokenizer,
|
||||
Trainer,
|
||||
TrainingArguments,
|
||||
DataCollatorForLanguageModeling
|
||||
)
|
||||
from datasets import Dataset
|
||||
from unsloth import FastLanguageModel, is_bfloat16_supported
|
||||
|
||||
from config import AppConfig
|
||||
from utils import check_gpu_memory, clear_gpu_cache, get_memory_usage
|
||||
|
||||
|
||||
class ModelTrainer:
|
||||
"""Trainer class for fine-tuning the Qwen2.5-Coder model"""
|
||||
|
||||
def __init__(self, config: AppConfig, output_dir: str = "./models"):
|
||||
"""
|
||||
Initialize the model trainer
|
||||
|
||||
Args:
|
||||
config: Application configuration
|
||||
output_dir: Directory to save the trained model
|
||||
"""
|
||||
self.config = config
|
||||
self.output_dir = Path(output_dir)
|
||||
self.output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
self.logger = logging.getLogger(__name__)
|
||||
|
||||
# Model and tokenizer
|
||||
self.model = None
|
||||
self.tokenizer = None
|
||||
|
||||
# Training components
|
||||
self.trainer = None
|
||||
|
||||
# Memory tracking
|
||||
self.initial_memory = None
|
||||
|
||||
def train(self, train_dataset: Dataset) -> str:
|
||||
"""
|
||||
Train the model on the provided dataset
|
||||
|
||||
Args:
|
||||
train_dataset: Dataset for training
|
||||
|
||||
Returns:
|
||||
Path to the saved model
|
||||
"""
|
||||
try:
|
||||
self.logger.info("Starting model training...")
|
||||
|
||||
# Check initial GPU memory
|
||||
self._check_initial_setup()
|
||||
|
||||
# Load model and tokenizer
|
||||
self._load_model_and_tokenizer()
|
||||
|
||||
# Prepare dataset
|
||||
tokenized_dataset = self._prepare_dataset(train_dataset)
|
||||
|
||||
# Setup trainer
|
||||
self._setup_trainer(tokenized_dataset)
|
||||
|
||||
# Start training
|
||||
self.logger.info("Beginning training loop...")
|
||||
self.trainer.train()
|
||||
|
||||
# Save final model
|
||||
final_model_path = self._save_model()
|
||||
|
||||
self.logger.info(f"Training completed successfully! Model saved to: {final_model_path}")
|
||||
return str(final_model_path)
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Training failed: {str(e)}")
|
||||
raise
|
||||
finally:
|
||||
self._cleanup()
|
||||
|
||||
def _check_initial_setup(self):
|
||||
"""Check initial GPU memory and setup"""
|
||||
gpu_info = check_gpu_memory()
|
||||
self.logger.info(f"GPU Memory Info: {gpu_info}")
|
||||
|
||||
# Store initial memory usage
|
||||
self.initial_memory = get_memory_usage()
|
||||
self.logger.info(".2f")
|
||||
|
||||
# Verify CUDA availability
|
||||
if not torch.cuda.is_available():
|
||||
raise RuntimeError("CUDA is not available. This trainer requires a CUDA-compatible GPU.")
|
||||
|
||||
self.logger.info(f"CUDA device: {torch.cuda.get_device_name()}")
|
||||
self.logger.info(f"CUDA version: {torch.version.cuda}")
|
||||
|
||||
def _load_model_and_tokenizer(self):
|
||||
"""Load the model and tokenizer with memory optimization"""
|
||||
self.logger.info(f"Loading model: {self.config.model.name}")
|
||||
|
||||
# Clear cache before loading
|
||||
clear_gpu_cache()
|
||||
|
||||
try:
|
||||
# Load model with unsloth for memory efficiency
|
||||
self.model, self.tokenizer = FastLanguageModel.from_pretrained(
|
||||
model_name=self.config.model.name,
|
||||
max_seq_length=self.config.model.max_seq_length,
|
||||
dtype=None, # Auto-detect
|
||||
load_in_4bit=True, # Use 4-bit quantization
|
||||
token=None, # Use default token
|
||||
)
|
||||
|
||||
# Configure model for training
|
||||
self.model = FastLanguageModel.get_peft_model(
|
||||
self.model,
|
||||
r=16, # LoRA rank
|
||||
target_modules=[
|
||||
"q_proj", "k_proj", "v_proj", "o_proj",
|
||||
"gate_proj", "up_proj", "down_proj"
|
||||
],
|
||||
lora_alpha=16,
|
||||
lora_dropout=0, # Supports any, but = 0 is optimized
|
||||
bias="none", # Supports any, but = "none" is optimized
|
||||
use_gradient_checkpointing=self.config.training.use_gradient_checkpointing,
|
||||
random_state=3407,
|
||||
use_rslora=False, # We support rank stabilized LoRA
|
||||
loftq_config=None, # And LoftQ
|
||||
)
|
||||
|
||||
self.logger.info("Model and tokenizer loaded successfully")
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Failed to load model: {str(e)}")
|
||||
raise
|
||||
|
||||
def _prepare_dataset(self, train_dataset: Dataset) -> Dataset:
|
||||
"""Prepare and tokenize the dataset"""
|
||||
self.logger.info("Preparing dataset...")
|
||||
|
||||
def tokenize_function(examples):
|
||||
return self.tokenizer(
|
||||
examples["text"],
|
||||
padding="max_length",
|
||||
truncation=True,
|
||||
max_length=self.config.model.max_seq_length,
|
||||
return_tensors="pt"
|
||||
)
|
||||
|
||||
# Tokenize dataset
|
||||
tokenized_dataset = train_dataset.map(
|
||||
tokenize_function,
|
||||
batched=True,
|
||||
remove_columns=["text", "language", "file_path", "repo_name", "file_size", "line_count"],
|
||||
desc="Tokenizing dataset"
|
||||
)
|
||||
|
||||
self.logger.info(f"Dataset tokenized. Size: {len(tokenized_dataset)}")
|
||||
return tokenized_dataset
|
||||
|
||||
def _setup_trainer(self, tokenized_dataset: Dataset):
|
||||
"""Setup the HuggingFace trainer with memory optimizations"""
|
||||
self.logger.info("Setting up trainer...")
|
||||
|
||||
# Training arguments optimized for RTX3070 8GB
|
||||
training_args = TrainingArguments(
|
||||
output_dir=str(self.output_dir / "checkpoints"),
|
||||
num_train_epochs=self.config.training.num_train_epochs,
|
||||
per_device_train_batch_size=self.config.training.per_device_train_batch_size,
|
||||
gradient_accumulation_steps=self.config.training.gradient_accumulation_steps,
|
||||
learning_rate=self.config.training.learning_rate,
|
||||
warmup_steps=self.config.training.warmup_steps,
|
||||
warmup_ratio=self.config.training.warmup_ratio,
|
||||
logging_steps=self.config.training.logging_steps,
|
||||
save_steps=self.config.training.save_steps,
|
||||
save_total_limit=self.config.training.save_total_limit,
|
||||
evaluation_strategy=self.config.training.evaluation_strategy,
|
||||
eval_steps=self.config.training.eval_steps,
|
||||
load_best_model_at_end=self.config.training.load_best_model_at_end,
|
||||
metric_for_best_model=self.config.training.metric_for_best_model,
|
||||
greater_is_better=self.config.training.greater_is_better,
|
||||
optim=self.config.training.optim,
|
||||
weight_decay=self.config.training.weight_decay,
|
||||
lr_scheduler_type=self.config.training.lr_scheduler_type,
|
||||
adam_beta1=self.config.training.adam_beta1,
|
||||
adam_beta2=self.config.training.adam_beta2,
|
||||
adam_epsilon=self.config.training.adam_epsilon,
|
||||
max_grad_norm=self.config.training.max_grad_norm,
|
||||
dataloader_num_workers=self.config.training.dataloader_num_workers,
|
||||
dataloader_pin_memory=self.config.training.dataloader_pin_memory,
|
||||
remove_unused_columns=self.config.training.remove_unused_columns,
|
||||
bf16=self.config.training.bf16 if is_bfloat16_supported() else False,
|
||||
fp16=self.config.training.fp16,
|
||||
tf32=self.config.training.tf32,
|
||||
report_to=self.config.training.report_to,
|
||||
logging_dir=self.config.training.logging_dir,
|
||||
seed=self.config.training.dataset_seed,
|
||||
data_seed=self.config.training.dataset_seed,
|
||||
dataloader_drop_last=True, # Better memory management
|
||||
gradient_checkpointing=self.config.training.use_gradient_checkpointing,
|
||||
# Memory optimization settings
|
||||
ddp_find_unused_parameters=False,
|
||||
per_device_eval_batch_size=self.config.training.per_device_train_batch_size,
|
||||
)
|
||||
|
||||
# Data collator
|
||||
data_collator = DataCollatorForLanguageModeling(
|
||||
tokenizer=self.tokenizer,
|
||||
mlm=False # Causal language modeling
|
||||
)
|
||||
|
||||
# Initialize trainer
|
||||
self.trainer = Trainer(
|
||||
model=self.model,
|
||||
args=training_args,
|
||||
train_dataset=tokenized_dataset,
|
||||
eval_dataset=tokenized_dataset, # Using same dataset for eval (for demo)
|
||||
data_collator=data_collator,
|
||||
tokenizer=self.tokenizer,
|
||||
)
|
||||
|
||||
self.logger.info("Trainer setup completed")
|
||||
|
||||
def _save_model(self) -> Path:
|
||||
"""Save the trained model"""
|
||||
self.logger.info("Saving model...")
|
||||
|
||||
# Create final model directory
|
||||
final_model_dir = self.output_dir / "final_model"
|
||||
final_model_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
try:
|
||||
# Save the model
|
||||
self.model.save_pretrained(str(final_model_dir))
|
||||
self.tokenizer.save_pretrained(str(final_model_dir))
|
||||
|
||||
# Save configuration
|
||||
self.config.save_yaml(final_model_dir / "training_config.yaml")
|
||||
|
||||
self.logger.info(f"Model saved to: {final_model_dir}")
|
||||
return final_model_dir
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Failed to save model: {str(e)}")
|
||||
raise
|
||||
|
||||
def _cleanup(self):
|
||||
"""Clean up resources"""
|
||||
try:
|
||||
# Clear GPU cache
|
||||
clear_gpu_cache()
|
||||
|
||||
# Force garbage collection
|
||||
gc.collect()
|
||||
|
||||
# Delete model and tokenizer to free memory
|
||||
if self.model is not None:
|
||||
del self.model
|
||||
if self.tokenizer is not None:
|
||||
del self.tokenizer
|
||||
if self.trainer is not None:
|
||||
del self.trainer
|
||||
|
||||
# Final memory cleanup
|
||||
if torch.cuda.is_available():
|
||||
torch.cuda.empty_cache()
|
||||
|
||||
except Exception as e:
|
||||
self.logger.warning(f"Error during cleanup: {str(e)}")
|
||||
319
src/utils.py
Normal file
319
src/utils.py
Normal file
@ -0,0 +1,319 @@
|
||||
"""
|
||||
Utility functions for AI Trainer
|
||||
Memory management, logging, and helper functions optimized for RTX3070 8GB VRAM
|
||||
"""
|
||||
|
||||
import gc
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Dict, Optional, Tuple, Any
|
||||
|
||||
import torch
|
||||
import psutil
|
||||
from colorama import init, Fore, Back, Style
|
||||
|
||||
# Initialize colorama for cross-platform colored output
|
||||
init(autoreset=True)
|
||||
|
||||
|
||||
def setup_logging(log_level: str = "INFO", log_file: Optional[str] = None) -> logging.Logger:
|
||||
"""
|
||||
Setup logging configuration with colored console output
|
||||
|
||||
Args:
|
||||
log_level: Logging level (DEBUG, INFO, WARNING, ERROR)
|
||||
log_file: Optional log file path
|
||||
|
||||
Returns:
|
||||
Configured logger
|
||||
"""
|
||||
# Create formatter with colors
|
||||
class ColoredFormatter(logging.Formatter):
|
||||
COLORS = {
|
||||
'DEBUG': Fore.CYAN,
|
||||
'INFO': Fore.GREEN,
|
||||
'WARNING': Fore.YELLOW,
|
||||
'ERROR': Fore.RED,
|
||||
'CRITICAL': Fore.RED + Back.WHITE
|
||||
}
|
||||
|
||||
def format(self, record):
|
||||
# Add color to the level name
|
||||
if record.levelname in self.COLORS:
|
||||
colored_levelname = f"{self.COLORS[record.levelname]}{record.levelname}{Style.RESET_ALL}"
|
||||
record.levelname = colored_levelname
|
||||
return super().format(record)
|
||||
|
||||
# Create logger
|
||||
logger = logging.getLogger()
|
||||
logger.setLevel(getattr(logging, log_level.upper()))
|
||||
|
||||
# Console handler with colors
|
||||
console_handler = logging.StreamHandler(sys.stdout)
|
||||
console_formatter = ColoredFormatter(
|
||||
'%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
||||
datefmt='%Y-%m-%d %H:%M:%S'
|
||||
)
|
||||
console_handler.setFormatter(console_formatter)
|
||||
logger.addHandler(console_handler)
|
||||
|
||||
# File handler if specified
|
||||
if log_file:
|
||||
log_path = Path(log_file)
|
||||
log_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
file_handler = logging.FileHandler(log_path)
|
||||
file_formatter = logging.Formatter(
|
||||
'%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
||||
datefmt='%Y-%m-%d %H:%M:%S'
|
||||
)
|
||||
file_handler.setFormatter(file_formatter)
|
||||
logger.addHandler(file_handler)
|
||||
|
||||
return logger
|
||||
|
||||
|
||||
def check_gpu_memory() -> Dict[str, Any]:
|
||||
"""
|
||||
Check GPU memory status and availability
|
||||
|
||||
Returns:
|
||||
Dictionary with GPU memory information
|
||||
"""
|
||||
if not torch.cuda.is_available():
|
||||
return {"error": "CUDA not available"}
|
||||
|
||||
try:
|
||||
device = torch.cuda.current_device()
|
||||
total_memory = torch.cuda.get_device_properties(device).total_memory
|
||||
allocated_memory = torch.cuda.memory_allocated(device)
|
||||
reserved_memory = torch.cuda.memory_reserved(device)
|
||||
free_memory = total_memory - allocated_memory
|
||||
|
||||
return {
|
||||
"device": torch.cuda.get_device_name(device),
|
||||
"device_id": device,
|
||||
"total_memory_gb": round(total_memory / (1024**3), 2),
|
||||
"allocated_memory_gb": round(allocated_memory / (1024**3), 2),
|
||||
"reserved_memory_gb": round(reserved_memory / (1024**3), 2),
|
||||
"free_memory_gb": round(free_memory / (1024**3), 2),
|
||||
"memory_utilization": round((allocated_memory / total_memory) * 100, 2),
|
||||
"cuda_version": torch.version.cuda,
|
||||
"cudnn_version": torch.backends.cudnn.version() if torch.backends.cudnn.is_available() else "N/A"
|
||||
}
|
||||
except Exception as e:
|
||||
return {"error": f"Failed to get GPU info: {str(e)}"}
|
||||
|
||||
|
||||
def get_memory_usage() -> Dict[str, float]:
|
||||
"""
|
||||
Get system memory usage
|
||||
|
||||
Returns:
|
||||
Dictionary with memory usage information
|
||||
"""
|
||||
try:
|
||||
# GPU memory
|
||||
gpu_memory = check_gpu_memory()
|
||||
|
||||
# System memory
|
||||
system_memory = psutil.virtual_memory()
|
||||
|
||||
return {
|
||||
"gpu_total_gb": gpu_memory.get("total_memory_gb", 0),
|
||||
"gpu_allocated_gb": gpu_memory.get("allocated_memory_gb", 0),
|
||||
"gpu_free_gb": gpu_memory.get("free_memory_gb", 0),
|
||||
"system_total_gb": round(system_memory.total / (1024**3), 2),
|
||||
"system_available_gb": round(system_memory.available / (1024**3), 2),
|
||||
"system_used_gb": round(system_memory.used / (1024**3), 2),
|
||||
"system_memory_percent": system_memory.percent
|
||||
}
|
||||
except Exception as e:
|
||||
return {"error": f"Failed to get memory usage: {str(e)}"}
|
||||
|
||||
|
||||
def clear_gpu_cache():
|
||||
"""Clear GPU cache and perform garbage collection"""
|
||||
try:
|
||||
if torch.cuda.is_available():
|
||||
torch.cuda.empty_cache()
|
||||
torch.cuda.synchronize()
|
||||
|
||||
# Force garbage collection
|
||||
gc.collect()
|
||||
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to clear GPU cache: {str(e)}")
|
||||
|
||||
|
||||
def optimize_memory_settings():
|
||||
"""Apply memory optimization settings for RTX3070"""
|
||||
try:
|
||||
if torch.cuda.is_available():
|
||||
# Set memory fraction to prevent out-of-memory
|
||||
torch.cuda.set_per_process_memory_fraction(0.85) # Use 85% of GPU memory
|
||||
|
||||
# Enable TF32 for better performance
|
||||
torch.backends.cuda.matmul.allow_tf32 = True
|
||||
torch.backends.cudnn.allow_tf32 = True
|
||||
|
||||
# Optimize CUDA memory allocator
|
||||
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:512'
|
||||
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to optimize memory settings: {str(e)}")
|
||||
|
||||
|
||||
def format_bytes(bytes_value: int) -> str:
|
||||
"""
|
||||
Format bytes into human readable format
|
||||
|
||||
Args:
|
||||
bytes_value: Number of bytes
|
||||
|
||||
Returns:
|
||||
Formatted string (e.g., "1.5 GB")
|
||||
"""
|
||||
for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
|
||||
if bytes_value < 1024.0:
|
||||
return ".1f"
|
||||
bytes_value /= 1024.0
|
||||
return ".1f"
|
||||
|
||||
|
||||
def print_system_info():
|
||||
"""Print comprehensive system information"""
|
||||
print(f"\n{Fore.CYAN}{'='*60}{Style.RESET_ALL}")
|
||||
print(f"{Fore.CYAN}SYSTEM INFORMATION{Style.RESET_ALL}")
|
||||
print(f"{Fore.CYAN}{'='*60}{Style.RESET_ALL}")
|
||||
|
||||
# GPU Information
|
||||
gpu_info = check_gpu_memory()
|
||||
if "error" not in gpu_info:
|
||||
print(f"\n{Fore.GREEN}GPU Information:{Style.RESET_ALL}")
|
||||
print(f" Device: {gpu_info['device']}")
|
||||
print(f" CUDA Version: {gpu_info['cuda_version']}")
|
||||
print(f" Total Memory: {gpu_info['total_memory_gb']} GB")
|
||||
print(f" Allocated Memory: {gpu_info['allocated_memory_gb']} GB")
|
||||
print(f" Free Memory: {gpu_info['free_memory_gb']} GB")
|
||||
print(f" Memory Utilization: {gpu_info['memory_utilization']}%")
|
||||
else:
|
||||
print(f"\n{Fore.RED}GPU Information: {gpu_info['error']}{Style.RESET_ALL}")
|
||||
|
||||
# System Memory
|
||||
system_memory = psutil.virtual_memory()
|
||||
print(f"\n{Fore.GREEN}System Memory:{Style.RESET_ALL}")
|
||||
print(f" Total: {format_bytes(system_memory.total)}")
|
||||
print(f" Available: {format_bytes(system_memory.available)}")
|
||||
print(f" Used: {format_bytes(system_memory.used)}")
|
||||
print(f" Usage: {system_memory.percent}%")
|
||||
|
||||
# CPU Information
|
||||
print(f"\n{Fore.GREEN}CPU Information:{Style.RESET_ALL}")
|
||||
print(f" Cores: {psutil.cpu_count(logical=False)} physical, {psutil.cpu_count(logical=True)} logical")
|
||||
print(f" CPU Usage: {psutil.cpu_percent()}%")
|
||||
|
||||
print(f"\n{Fore.CYAN}{'='*60}{Style.RESET_ALL}")
|
||||
|
||||
|
||||
def validate_environment():
|
||||
"""Validate that the environment is suitable for training"""
|
||||
issues = []
|
||||
|
||||
# Check CUDA availability
|
||||
if not torch.cuda.is_available():
|
||||
issues.append("CUDA is not available. A CUDA-compatible GPU is required.")
|
||||
|
||||
# Check GPU memory
|
||||
if torch.cuda.is_available():
|
||||
gpu_info = check_gpu_memory()
|
||||
if "total_memory_gb" in gpu_info:
|
||||
total_memory = gpu_info["total_memory_gb"]
|
||||
if total_memory < 8:
|
||||
issues.append(f"GPU memory ({total_memory} GB) may be insufficient. Recommended: 8GB+")
|
||||
|
||||
# Check required Python modules
|
||||
required_modules = ['torch', 'transformers', 'datasets', 'git']
|
||||
for module in required_modules:
|
||||
try:
|
||||
__import__(module)
|
||||
except ImportError:
|
||||
issues.append(f"Required module '{module}' is not installed.")
|
||||
|
||||
if issues:
|
||||
print(f"\n{Fore.YELLOW}Environment Validation Issues:{Style.RESET_ALL}")
|
||||
for issue in issues:
|
||||
print(f" - {issue}")
|
||||
return False
|
||||
|
||||
print(f"\n{Fore.GREEN}Environment validation passed!{Style.RESET_ALL}")
|
||||
return True
|
||||
|
||||
|
||||
def create_training_summary(config, training_time: float, final_model_path: str) -> str:
|
||||
"""
|
||||
Create a summary of the training session
|
||||
|
||||
Args:
|
||||
config: Training configuration
|
||||
training_time: Training time in seconds
|
||||
final_model_path: Path to the saved model
|
||||
|
||||
Returns:
|
||||
Formatted summary string
|
||||
"""
|
||||
summary = ".1f"".2f"f"""
|
||||
{Fore.CYAN}{'='*60}{Style.RESET_ALL}
|
||||
TRAINING SUMMARY
|
||||
{Fore.CYAN}{'='*60}{Style.RESET_ALL}
|
||||
|
||||
Configuration:
|
||||
Model: {config.model.name}
|
||||
Epochs: {config.training.num_train_epochs}
|
||||
Batch Size: {config.training.per_device_train_batch_size}
|
||||
Gradient Accumulation: {config.training.gradient_accumulation_steps}
|
||||
Learning Rate: {config.training.learning_rate}
|
||||
Max Sequence Length: {config.model.max_seq_length}
|
||||
|
||||
Performance:
|
||||
Training Time: {training_time:.2f} seconds ({training_time/3600:.2f} hours)
|
||||
Effective Batch Size: {config.training.per_device_train_batch_size * config.training.gradient_accumulation_steps}
|
||||
|
||||
Output:
|
||||
Model Saved To: {final_model_path}
|
||||
|
||||
Memory Settings:
|
||||
Gradient Checkpointing: {config.training.use_gradient_checkpointing}
|
||||
CPU Offloading: {config.training.offload_to_cpu}
|
||||
BF16 Enabled: {config.training.bf16}
|
||||
|
||||
{Fore.CYAN}{'='*60}{Style.RESET_ALL}
|
||||
"""
|
||||
|
||||
return summary
|
||||
|
||||
|
||||
def safe_import(module_name: str, fallback: Any = None):
|
||||
"""
|
||||
Safely import a module with fallback
|
||||
|
||||
Args:
|
||||
module_name: Name of the module to import
|
||||
fallback: Fallback value if import fails
|
||||
|
||||
Returns:
|
||||
Imported module or fallback
|
||||
"""
|
||||
try:
|
||||
return __import__(module_name)
|
||||
except ImportError:
|
||||
return fallback
|
||||
|
||||
|
||||
# Initialize memory optimization settings on import
|
||||
try:
|
||||
optimize_memory_settings()
|
||||
except Exception:
|
||||
pass # Ignore errors during initialization
|
||||
Loading…
Reference in New Issue
Block a user