google_map_review/README.md
2026-02-25 17:13:05 +07:00

116 lines
3.7 KiB
Markdown

# Google Maps Review Crawler
A Python-based automation tool that fetches customer reviews from the Google Business Profile API and synchronizes them to a PostgreSQL database.
## Features
- **OAuth 2.0 Authentication**: Connects directly to your Google Business Profile to fetch *all* reviews (bypassing the standard Places API 5-review limit).
- **Multi-Outlet Support**: Automatically queries the `master_outlet` table for active `google_business_id`s and iterates through all your locations.
- **Original Language Extraction**: Strips automated "Translated by Google" annotations to ensure only authentic, original review text is saved.
- **Rolling Window Filter**: Only processes reviews published within the last 3 months (90 days) to optimize API calls.
- **UPSERT Logic**: Safely updates existing database records (e.g., if a customer changes their rating or text) without creating duplicates.
- **Automated Scheduler**: Includes a background daemon script (`schedule_crawler.py`) to run automatically every hour.
---
## Prerequisites
- Python 3.9+
- PostgreSQL database
- A Google Cloud Project with the **Google My Business API (v4)** enabled.
- OAuth 2.0 Client credentials downloaded as `client_secret.json`.
---
## 1. Installation & Environment Setup
1. **Clone the repository** (or navigate to the folder).
2. **Create a virtual environment**:
```bash
python -m venv venv
source venv/bin/activate
```
3. **Install dependencies**:
```bash
pip install -r requirements.txt
```
4. **Configure Database Credentials**:
Create a `.env` file in the root directory:
```env
DB_HOST=192.169.0.10
DB_PORT=5432
DB_NAME=your_db_name
DB_USER=your_db_user
DB_PASSWORD=your_db_password
```
---
## 2. Authentication (One-Time Setup)
The Google Business Profile API requires explicit permission to read your reviews.
1. Ensure your `client_secret.json` from Google Cloud Console is in the project folder.
2. Run the authorization script:
```bash
python authorize.py
```
3. A browser window will open. Log in with the Google Account that manages your Business Profiles.
4. Once authorized, a `token.json` file will be created. The crawler will automatically use and refresh this token moving forward.
---
## 3. Database Setup & Mapping
The crawler maps Google locations to your database using `google_business_id`.
1. **Find your Location IDs**:
Run the helper script to list all active stores managed by your Google Account:
```bash
python list_locations.py
```
*Note: This will output the Store Code and the long numeric Location ID.*
2. **Update your Database**:
Insert the `Location ID` into the `google_business_id` column of your `master_outlet` table.
---
## 4. Running the Crawler
### Manual Execution
To run the crawler once immediately:
```bash
python crawler.py
```
### Automated (Hourly Scheduler)
To run the crawler continuously in the background (runs once every hour):
```bash
chmod +x run_hourly.sh
./run_hourly.sh
```
- **Scheduler Logs**: `tail -f scheduler.log` (monitors the hourly heartbeat)
- **Crawler Logs**: `tail -f crawler.log` (monitors the specific reviews being upserted)
To stop the background scheduler:
```bash
pkill -f schedule_crawler.py
```
---
## Database Schema (`google_review`)
Built automatically by `database.py`:
- `id` (SERIAL PRIMARY KEY)
- `review_id` (TEXT UNIQUE)
- `place_id` (TEXT) - *Legacy column, nullable*
- `original_text` (TEXT) - *The clean, untranslated review text*
- `author_display_name` (TEXT)
- `publish_time` (TIMESTAMP)
- `rating` (INTEGER)
- `outlet_code` (VARCHAR) - *Foreign Key linked to master_outlet.popcorn_code*
- `language` (VARCHAR)
- `created_at` (TIMESTAMP)
- `updated_at` (TIMESTAMP)