# Odoo AI Model Trainer A comprehensive Python project for training AI models on Odoo documentation using Unsloth, optimized for RTX3070 8GB VRAM. The project scrapes both English and Indonesian Odoo documentation and fine-tunes the unsloth/Qwen3-8B-bnb-4bit model. ## Features - 🌐 **Bilingual Support**: Scrapes both English and Indonesian Odoo documentation - 🚀 **Optimized Training**: Uses Unsloth for 2x faster training and 70% less memory - 🎯 **RTX3070 Optimized**: Configured for 8GB VRAM with memory-efficient settings - 📊 **Data Pipeline**: Complete pipeline from data collection to model training - 🔧 **Modular Design**: Separate scripts for scraping, preprocessing, and training - 📈 **Progress Tracking**: Built-in statistics and progress monitoring ## Requirements ### Hardware - NVIDIA RTX3070 (8GB VRAM) or better - 16GB+ RAM recommended - 50GB+ free disk space ### Software - Python 3.8+ - CUDA 11.8+ - PyTorch with CUDA support ## Installation 1. **Clone or download this project** 2. **Install dependencies**: ```bash pip install -r requirements.txt ``` 3. **Verify CUDA installation**: ```bash python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')" ``` ## Usage ### Full Pipeline (Recommended) Run the complete training pipeline: ```bash python main.py ``` ### Step-by-Step Execution 1. **Data Collection Only**: ```bash python main.py --only-collection ``` 2. **Data Preprocessing Only**: ```bash python main.py --only-preprocessing ``` 3. **Model Training Only**: ```bash python main.py --only-training ``` ### Skip Specific Steps ```bash # Skip data collection if you already have data python main.py --skip-collection # Skip preprocessing if you already have training data python main.py --skip-preprocessing # Skip training for testing other components python main.py --skip-training ``` ### Individual Scripts You can also run individual scripts directly: ```bash # Scrape Odoo documentation python data_scraper.py # Preprocess the scraped data python data_preprocessor.py # Train the model python train_model.py ``` ## Project Structure ``` . ├── main.py # Main orchestrator script ├── data_scraper.py # Web scraping for Odoo docs ├── data_preprocessor.py # Data cleaning and formatting ├── train_model.py # Model training with Unsloth ├── requirements.txt # Python dependencies ├── README.md # This file ├── odoo_docs_data.csv # Scraped raw data (generated) ├── training_data.json # Processed training data (generated) └── odoo_model_output/ # Trained model (generated) ``` ## Output Files - **odoo_docs_data.csv**: Raw scraped documentation - **training_data.json**: Processed training data in instruction format - **odoo_model_output/**: Directory containing the fine-tuned model - **odoo_model_output_gguf/**: GGUF quantized model for deployment ## Configuration ### Memory Optimization for RTX3070 The training is configured with: - Batch size: 1 (per device) - Gradient accumulation: 4 (effective batch size: 4) - Max sequence length: 2048 tokens - 4-bit quantization to save VRAM - Gradient checkpointing enabled ### Training Parameters - Learning rate: 2e-4 - Max steps: 100 (increase for production) - Warmup steps: 5 - LoRA rank: 16 - LoRA alpha: 16 ## Troubleshooting ### CUDA Out of Memory If you encounter CUDA OOM errors: 1. Reduce batch size in `train_model.py` 2. Increase gradient accumulation steps 3. Reduce max sequence length 4. Restart your Python session ### Data Collection Issues - Check internet connection - Odoo website may block rapid requests - the script includes delays - If Indonesian docs fail, they may be at a different URL ### Training Issues - Ensure CUDA is properly installed - Check that your GPU drivers are up to date - Verify PyTorch CUDA compatibility ## Model Usage After training, you can use the model for Odoo-related questions: ```python from train_model import OdooModelTrainer trainer = OdooModelTrainer() trainer.load_model() # Load your trained model # trainer.model = ... (load from odoo_model_output) response = trainer.generate_response("How do I install Odoo?") print(response) ``` ## Performance Notes - **Training Time**: ~30-60 minutes for 100 steps on RTX3070 - **Memory Usage**: ~6-7GB VRAM during training - **Data Size**: ~20-50MB of documentation data - **Model Size**: ~4-5GB for the fine-tuned model ## Contributing Feel free to submit issues and enhancement requests! ## License This project is open source. Please check individual component licenses for details. ## Disclaimer This project is for educational and research purposes. Ensure compliance with Odoo's terms of service when scraping documentation.