Go to file
2026-02-26 09:34:20 +07:00
__pycache__ feat: Implement Google Business Profile API for review crawling and refine review filtering logic. 2026-02-26 09:34:20 +07:00
venv first commit 2026-02-25 17:13:05 +07:00
.env first commit 2026-02-25 17:13:05 +07:00
authorize.py first commit 2026-02-25 17:13:05 +07:00
check_data.py first commit 2026-02-25 17:13:05 +07:00
check_original.py first commit 2026-02-25 17:13:05 +07:00
client_secret.json first commit 2026-02-25 17:13:05 +07:00
crawler.log feat: Implement Google Business Profile API for review crawling and refine review filtering logic. 2026-02-26 09:34:20 +07:00
crawler.py feat: Implement Google Business Profile API for review crawling and refine review filtering logic. 2026-02-26 09:34:20 +07:00
database.py feat: Implement Google Business Profile API for review crawling and refine review filtering logic. 2026-02-26 09:34:20 +07:00
fetch_reviews.py first commit 2026-02-25 17:13:05 +07:00
inspect_business_ids.py first commit 2026-02-25 17:13:05 +07:00
inspect_db_simple.py first commit 2026-02-25 17:13:05 +07:00
inspect_db.py first commit 2026-02-25 17:13:05 +07:00
list_locations.py first commit 2026-02-25 17:13:05 +07:00
migrate_db.py first commit 2026-02-25 17:13:05 +07:00
migrate_updated_at.py first commit 2026-02-25 17:13:05 +07:00
README.md first commit 2026-02-25 17:13:05 +07:00
requirements.txt first commit 2026-02-25 17:13:05 +07:00
revert_test_data.py first commit 2026-02-25 17:13:05 +07:00
run_hourly.sh first commit 2026-02-25 17:13:05 +07:00
schedule_crawler.py first commit 2026-02-25 17:13:05 +07:00
scheduler.log feat: Implement Google Business Profile API for review crawling and refine review filtering logic. 2026-02-26 09:34:20 +07:00
setup_test_data.py first commit 2026-02-25 17:13:05 +07:00
test_all_api.py first commit 2026-02-25 17:13:05 +07:00
test_api.py first commit 2026-02-25 17:13:05 +07:00
test_db.py first commit 2026-02-25 17:13:05 +07:00
token.json feat: Implement Google Business Profile API for review crawling and refine review filtering logic. 2026-02-26 09:34:20 +07:00

Google Maps Review Crawler

A Python-based automation tool that fetches customer reviews from the Google Business Profile API and synchronizes them to a PostgreSQL database.

Features

  • OAuth 2.0 Authentication: Connects directly to your Google Business Profile to fetch all reviews (bypassing the standard Places API 5-review limit).
  • Multi-Outlet Support: Automatically queries the master_outlet table for active google_business_ids and iterates through all your locations.
  • Original Language Extraction: Strips automated "Translated by Google" annotations to ensure only authentic, original review text is saved.
  • Rolling Window Filter: Only processes reviews published within the last 3 months (90 days) to optimize API calls.
  • UPSERT Logic: Safely updates existing database records (e.g., if a customer changes their rating or text) without creating duplicates.
  • Automated Scheduler: Includes a background daemon script (schedule_crawler.py) to run automatically every hour.

Prerequisites

  • Python 3.9+
  • PostgreSQL database
  • A Google Cloud Project with the Google My Business API (v4) enabled.
  • OAuth 2.0 Client credentials downloaded as client_secret.json.

1. Installation & Environment Setup

  1. Clone the repository (or navigate to the folder).
  2. Create a virtual environment:
    python -m venv venv
    source venv/bin/activate
    
  3. Install dependencies:
    pip install -r requirements.txt
    
  4. Configure Database Credentials: Create a .env file in the root directory:
    DB_HOST=192.169.0.10
    DB_PORT=5432
    DB_NAME=your_db_name
    DB_USER=your_db_user
    DB_PASSWORD=your_db_password
    

2. Authentication (One-Time Setup)

The Google Business Profile API requires explicit permission to read your reviews.

  1. Ensure your client_secret.json from Google Cloud Console is in the project folder.
  2. Run the authorization script:
    python authorize.py
    
  3. A browser window will open. Log in with the Google Account that manages your Business Profiles.
  4. Once authorized, a token.json file will be created. The crawler will automatically use and refresh this token moving forward.

3. Database Setup & Mapping

The crawler maps Google locations to your database using google_business_id.

  1. Find your Location IDs: Run the helper script to list all active stores managed by your Google Account:

    python list_locations.py
    

    Note: This will output the Store Code and the long numeric Location ID.

  2. Update your Database: Insert the Location ID into the google_business_id column of your master_outlet table.


4. Running the Crawler

Manual Execution

To run the crawler once immediately:

python crawler.py

Automated (Hourly Scheduler)

To run the crawler continuously in the background (runs once every hour):

chmod +x run_hourly.sh
./run_hourly.sh
  • Scheduler Logs: tail -f scheduler.log (monitors the hourly heartbeat)
  • Crawler Logs: tail -f crawler.log (monitors the specific reviews being upserted)

To stop the background scheduler:

pkill -f schedule_crawler.py

Database Schema (google_review)

Built automatically by database.py:

  • id (SERIAL PRIMARY KEY)
  • review_id (TEXT UNIQUE)
  • place_id (TEXT) - Legacy column, nullable
  • original_text (TEXT) - The clean, untranslated review text
  • author_display_name (TEXT)
  • publish_time (TIMESTAMP)
  • rating (INTEGER)
  • outlet_code (VARCHAR) - Foreign Key linked to master_outlet.popcorn_code
  • language (VARCHAR)
  • created_at (TIMESTAMP)
  • updated_at (TIMESTAMP)