| __pycache__ | ||
| venv | ||
| .env | ||
| authorize.py | ||
| check_data.py | ||
| check_original.py | ||
| client_secret.json | ||
| crawler.log | ||
| crawler.py | ||
| database.py | ||
| fetch_reviews.py | ||
| inspect_business_ids.py | ||
| inspect_db_simple.py | ||
| inspect_db.py | ||
| list_locations.py | ||
| migrate_db.py | ||
| migrate_updated_at.py | ||
| README.md | ||
| requirements.txt | ||
| revert_test_data.py | ||
| run_hourly.sh | ||
| schedule_crawler.py | ||
| scheduler.log | ||
| setup_test_data.py | ||
| test_all_api.py | ||
| test_api.py | ||
| test_db.py | ||
| token.json | ||
Google Maps Review Crawler
A Python-based automation tool that fetches customer reviews from the Google Business Profile API and synchronizes them to a PostgreSQL database.
Features
- OAuth 2.0 Authentication: Connects directly to your Google Business Profile to fetch all reviews (bypassing the standard Places API 5-review limit).
- Multi-Outlet Support: Automatically queries the
master_outlettable for activegoogle_business_ids and iterates through all your locations. - Original Language Extraction: Strips automated "Translated by Google" annotations to ensure only authentic, original review text is saved.
- Rolling Window Filter: Only processes reviews published within the last 3 months (90 days) to optimize API calls.
- UPSERT Logic: Safely updates existing database records (e.g., if a customer changes their rating or text) without creating duplicates.
- Automated Scheduler: Includes a background daemon script (
schedule_crawler.py) to run automatically every hour.
Prerequisites
- Python 3.9+
- PostgreSQL database
- A Google Cloud Project with the Google My Business API (v4) enabled.
- OAuth 2.0 Client credentials downloaded as
client_secret.json.
1. Installation & Environment Setup
- Clone the repository (or navigate to the folder).
- Create a virtual environment:
python -m venv venv source venv/bin/activate - Install dependencies:
pip install -r requirements.txt - Configure Database Credentials:
Create a
.envfile in the root directory:DB_HOST=192.169.0.10 DB_PORT=5432 DB_NAME=your_db_name DB_USER=your_db_user DB_PASSWORD=your_db_password
2. Authentication (One-Time Setup)
The Google Business Profile API requires explicit permission to read your reviews.
- Ensure your
client_secret.jsonfrom Google Cloud Console is in the project folder. - Run the authorization script:
python authorize.py - A browser window will open. Log in with the Google Account that manages your Business Profiles.
- Once authorized, a
token.jsonfile will be created. The crawler will automatically use and refresh this token moving forward.
3. Database Setup & Mapping
The crawler maps Google locations to your database using google_business_id.
-
Find your Location IDs: Run the helper script to list all active stores managed by your Google Account:
python list_locations.pyNote: This will output the Store Code and the long numeric Location ID.
-
Update your Database: Insert the
Location IDinto thegoogle_business_idcolumn of yourmaster_outlettable.
4. Running the Crawler
Manual Execution
To run the crawler once immediately:
python crawler.py
Automated (Hourly Scheduler)
To run the crawler continuously in the background (runs once every hour):
chmod +x run_hourly.sh
./run_hourly.sh
- Scheduler Logs:
tail -f scheduler.log(monitors the hourly heartbeat) - Crawler Logs:
tail -f crawler.log(monitors the specific reviews being upserted)
To stop the background scheduler:
pkill -f schedule_crawler.py
Database Schema (google_review)
Built automatically by database.py:
id(SERIAL PRIMARY KEY)review_id(TEXT UNIQUE)place_id(TEXT) - Legacy column, nullableoriginal_text(TEXT) - The clean, untranslated review textauthor_display_name(TEXT)publish_time(TIMESTAMP)rating(INTEGER)outlet_code(VARCHAR) - Foreign Key linked to master_outlet.popcorn_codelanguage(VARCHAR)created_at(TIMESTAMP)updated_at(TIMESTAMP)