# Google Maps Review Crawler A Python-based automation tool that fetches customer reviews from the Google Business Profile API and synchronizes them to a PostgreSQL database. ## Features - **OAuth 2.0 Authentication**: Connects directly to your Google Business Profile to fetch *all* reviews (bypassing the standard Places API 5-review limit). - **Multi-Outlet Support**: Automatically queries the `master_outlet` table for active `google_business_id`s and iterates through all your locations. - **Original Language Extraction**: Strips automated "Translated by Google" annotations to ensure only authentic, original review text is saved. - **Rolling Window Filter**: Only processes reviews published within the last 3 months (90 days) to optimize API calls. - **UPSERT Logic**: Safely updates existing database records (e.g., if a customer changes their rating or text) without creating duplicates. - **Automated Scheduler**: Includes a background daemon script (`schedule_crawler.py`) to run automatically every hour. --- ## Prerequisites - Python 3.9+ - PostgreSQL database - A Google Cloud Project with the **Google My Business API (v4)** enabled. - OAuth 2.0 Client credentials downloaded as `client_secret.json`. --- ## 1. Installation & Environment Setup 1. **Clone the repository** (or navigate to the folder). 2. **Create a virtual environment**: ```bash python -m venv venv source venv/bin/activate ``` 3. **Install dependencies**: ```bash pip install -r requirements.txt ``` 4. **Configure Database Credentials**: Create a `.env` file in the root directory: ```env DB_HOST=192.169.0.10 DB_PORT=5432 DB_NAME=your_db_name DB_USER=your_db_user DB_PASSWORD=your_db_password ``` --- ## 2. Authentication (One-Time Setup) The Google Business Profile API requires explicit permission to read your reviews. 1. Ensure your `client_secret.json` from Google Cloud Console is in the project folder. 2. Run the authorization script: ```bash python authorize.py ``` 3. A browser window will open. Log in with the Google Account that manages your Business Profiles. 4. Once authorized, a `token.json` file will be created. The crawler will automatically use and refresh this token moving forward. --- ## 3. Database Setup & Mapping The crawler maps Google locations to your database using `google_business_id`. 1. **Find your Location IDs**: Run the helper script to list all active stores managed by your Google Account: ```bash python list_locations.py ``` *Note: This will output the Store Code and the long numeric Location ID.* 2. **Update your Database**: Insert the `Location ID` into the `google_business_id` column of your `master_outlet` table. --- ## 4. Running the Crawler ### Manual Execution To run the crawler once immediately: ```bash python crawler.py ``` ### Automated (Hourly Scheduler) To run the crawler continuously in the background (runs once every hour): ```bash chmod +x run_hourly.sh ./run_hourly.sh ``` - **Scheduler Logs**: `tail -f scheduler.log` (monitors the hourly heartbeat) - **Crawler Logs**: `tail -f crawler.log` (monitors the specific reviews being upserted) To stop the background scheduler: ```bash pkill -f schedule_crawler.py ``` --- ## Database Schema (`google_review`) Built automatically by `database.py`: - `id` (SERIAL PRIMARY KEY) - `review_id` (TEXT UNIQUE) - `place_id` (TEXT) - *Legacy column, nullable* - `original_text` (TEXT) - *The clean, untranslated review text* - `author_display_name` (TEXT) - `publish_time` (TIMESTAMP) - `rating` (INTEGER) - `outlet_code` (VARCHAR) - *Foreign Key linked to master_outlet.popcorn_code* - `language` (VARCHAR) - `created_at` (TIMESTAMP) - `updated_at` (TIMESTAMP)