Go to file
2025-08-22 16:30:56 +07:00
data_preprocessor.py first commit 2025-08-22 16:30:56 +07:00
data_scraper.py first commit 2025-08-22 16:30:56 +07:00
main.py first commit 2025-08-22 16:30:56 +07:00
README.md first commit 2025-08-22 16:30:56 +07:00
requirements.txt first commit 2025-08-22 16:30:56 +07:00
test_setup.py first commit 2025-08-22 16:30:56 +07:00
train_model.py first commit 2025-08-22 16:30:56 +07:00

Odoo AI Model Trainer

A comprehensive Python project for training AI models on Odoo documentation using Unsloth, optimized for RTX3070 8GB VRAM. The project scrapes both English and Indonesian Odoo documentation and fine-tunes the unsloth/Qwen3-8B-bnb-4bit model.

Features

  • 🌐 Bilingual Support: Scrapes both English and Indonesian Odoo documentation
  • 🚀 Optimized Training: Uses Unsloth for 2x faster training and 70% less memory
  • 🎯 RTX3070 Optimized: Configured for 8GB VRAM with memory-efficient settings
  • 📊 Data Pipeline: Complete pipeline from data collection to model training
  • 🔧 Modular Design: Separate scripts for scraping, preprocessing, and training
  • 📈 Progress Tracking: Built-in statistics and progress monitoring

Requirements

Hardware

  • NVIDIA RTX3070 (8GB VRAM) or better
  • 16GB+ RAM recommended
  • 50GB+ free disk space

Software

  • Python 3.8+
  • CUDA 11.8+
  • PyTorch with CUDA support

Installation

  1. Clone or download this project

  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Verify CUDA installation:

    python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
    

Usage

Run the complete training pipeline:

python main.py

Step-by-Step Execution

  1. Data Collection Only:

    python main.py --only-collection
    
  2. Data Preprocessing Only:

    python main.py --only-preprocessing
    
  3. Model Training Only:

    python main.py --only-training
    

Skip Specific Steps

# Skip data collection if you already have data
python main.py --skip-collection

# Skip preprocessing if you already have training data
python main.py --skip-preprocessing

# Skip training for testing other components
python main.py --skip-training

Individual Scripts

You can also run individual scripts directly:

# Scrape Odoo documentation
python data_scraper.py

# Preprocess the scraped data
python data_preprocessor.py

# Train the model
python train_model.py

Project Structure

.
├── main.py                    # Main orchestrator script
├── data_scraper.py           # Web scraping for Odoo docs
├── data_preprocessor.py      # Data cleaning and formatting
├── train_model.py           # Model training with Unsloth
├── requirements.txt          # Python dependencies
├── README.md                # This file
├── odoo_docs_data.csv       # Scraped raw data (generated)
├── training_data.json       # Processed training data (generated)
└── odoo_model_output/       # Trained model (generated)

Output Files

  • odoo_docs_data.csv: Raw scraped documentation
  • training_data.json: Processed training data in instruction format
  • odoo_model_output/: Directory containing the fine-tuned model
  • odoo_model_output_gguf/: GGUF quantized model for deployment

Configuration

Memory Optimization for RTX3070

The training is configured with:

  • Batch size: 1 (per device)
  • Gradient accumulation: 4 (effective batch size: 4)
  • Max sequence length: 2048 tokens
  • 4-bit quantization to save VRAM
  • Gradient checkpointing enabled

Training Parameters

  • Learning rate: 2e-4
  • Max steps: 100 (increase for production)
  • Warmup steps: 5
  • LoRA rank: 16
  • LoRA alpha: 16

Troubleshooting

CUDA Out of Memory

If you encounter CUDA OOM errors:

  1. Reduce batch size in train_model.py
  2. Increase gradient accumulation steps
  3. Reduce max sequence length
  4. Restart your Python session

Data Collection Issues

  • Check internet connection
  • Odoo website may block rapid requests - the script includes delays
  • If Indonesian docs fail, they may be at a different URL

Training Issues

  • Ensure CUDA is properly installed
  • Check that your GPU drivers are up to date
  • Verify PyTorch CUDA compatibility

Model Usage

After training, you can use the model for Odoo-related questions:

from train_model import OdooModelTrainer

trainer = OdooModelTrainer()
trainer.load_model()

# Load your trained model
# trainer.model = ... (load from odoo_model_output)

response = trainer.generate_response("How do I install Odoo?")
print(response)

Performance Notes

  • Training Time: ~30-60 minutes for 100 steps on RTX3070
  • Memory Usage: ~6-7GB VRAM during training
  • Data Size: ~20-50MB of documentation data
  • Model Size: ~4-5GB for the fine-tuned model

Contributing

Feel free to submit issues and enhancement requests!

License

This project is open source. Please check individual component licenses for details.

Disclaimer

This project is for educational and research purposes. Ensure compliance with Odoo's terms of service when scraping documentation.