4.8 KiB
Odoo AI Model Trainer
A comprehensive Python project for training AI models on Odoo documentation using Unsloth, optimized for RTX3070 8GB VRAM. The project scrapes both English and Indonesian Odoo documentation and fine-tunes the unsloth/Qwen3-8B-bnb-4bit model.
Features
- 🌐 Bilingual Support: Scrapes both English and Indonesian Odoo documentation
- 🚀 Optimized Training: Uses Unsloth for 2x faster training and 70% less memory
- 🎯 RTX3070 Optimized: Configured for 8GB VRAM with memory-efficient settings
- 📊 Data Pipeline: Complete pipeline from data collection to model training
- 🔧 Modular Design: Separate scripts for scraping, preprocessing, and training
- 📈 Progress Tracking: Built-in statistics and progress monitoring
Requirements
Hardware
- NVIDIA RTX3070 (8GB VRAM) or better
- 16GB+ RAM recommended
- 50GB+ free disk space
Software
- Python 3.8+
- CUDA 11.8+
- PyTorch with CUDA support
Installation
-
Clone or download this project
-
Install dependencies:
pip install -r requirements.txt -
Verify CUDA installation:
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
Usage
Full Pipeline (Recommended)
Run the complete training pipeline:
python main.py
Step-by-Step Execution
-
Data Collection Only:
python main.py --only-collection -
Data Preprocessing Only:
python main.py --only-preprocessing -
Model Training Only:
python main.py --only-training
Skip Specific Steps
# Skip data collection if you already have data
python main.py --skip-collection
# Skip preprocessing if you already have training data
python main.py --skip-preprocessing
# Skip training for testing other components
python main.py --skip-training
Individual Scripts
You can also run individual scripts directly:
# Scrape Odoo documentation
python data_scraper.py
# Preprocess the scraped data
python data_preprocessor.py
# Train the model
python train_model.py
Project Structure
.
├── main.py # Main orchestrator script
├── data_scraper.py # Web scraping for Odoo docs
├── data_preprocessor.py # Data cleaning and formatting
├── train_model.py # Model training with Unsloth
├── requirements.txt # Python dependencies
├── README.md # This file
├── odoo_docs_data.csv # Scraped raw data (generated)
├── training_data.json # Processed training data (generated)
└── odoo_model_output/ # Trained model (generated)
Output Files
- odoo_docs_data.csv: Raw scraped documentation
- training_data.json: Processed training data in instruction format
- odoo_model_output/: Directory containing the fine-tuned model
- odoo_model_output_gguf/: GGUF quantized model for deployment
Configuration
Memory Optimization for RTX3070
The training is configured with:
- Batch size: 1 (per device)
- Gradient accumulation: 4 (effective batch size: 4)
- Max sequence length: 2048 tokens
- 4-bit quantization to save VRAM
- Gradient checkpointing enabled
Training Parameters
- Learning rate: 2e-4
- Max steps: 100 (increase for production)
- Warmup steps: 5
- LoRA rank: 16
- LoRA alpha: 16
Troubleshooting
CUDA Out of Memory
If you encounter CUDA OOM errors:
- Reduce batch size in
train_model.py - Increase gradient accumulation steps
- Reduce max sequence length
- Restart your Python session
Data Collection Issues
- Check internet connection
- Odoo website may block rapid requests - the script includes delays
- If Indonesian docs fail, they may be at a different URL
Training Issues
- Ensure CUDA is properly installed
- Check that your GPU drivers are up to date
- Verify PyTorch CUDA compatibility
Model Usage
After training, you can use the model for Odoo-related questions:
from train_model import OdooModelTrainer
trainer = OdooModelTrainer()
trainer.load_model()
# Load your trained model
# trainer.model = ... (load from odoo_model_output)
response = trainer.generate_response("How do I install Odoo?")
print(response)
Performance Notes
- Training Time: ~30-60 minutes for 100 steps on RTX3070
- Memory Usage: ~6-7GB VRAM during training
- Data Size: ~20-50MB of documentation data
- Model Size: ~4-5GB for the fine-tuned model
Contributing
Feel free to submit issues and enhancement requests!
License
This project is open source. Please check individual component licenses for details.
Disclaimer
This project is for educational and research purposes. Ensure compliance with Odoo's terms of service when scraping documentation.