A complete system to scrape, store, query, and report on publicly listed education agents from Australian university websites.
cd agent_scraper
pip install -r requirements.txt
scrape.py)python scrape.py
python scrape.py --list
python scrape.py --uni "Monash"
python scrape.py --uni "Melbourne"
python scrape.py --uni "Queensland"
python scrape.py --refresh
python scrape.py --load-only
How the scraper works: The scraper tries multiple strategies per page, in order:
query.py)python query.py agents
python query.py agents --country "China"
python query.py agents --country "India"
python query.py agents --country "Indonesia"
python query.py agents --university "Monash"
python query.py agents --has-email
python query.py agents --email "@gmail.com"
python query.py agents --email "education"
python query.py search "IDP"
python query.py search "education group"
python query.py search "Beijing"
python query.py stats # By country (default)
python query.py stats --by university # By university
python query.py stats --by country_university # Cross-tab
python query.py coverage
python query.py agents --country "Vietnam" --export vietnam_agents.xlsx
python query.py stats --by country --export country_stats.xlsx
python query.py coverage --export coverage.xlsx
social_report.py)python social_report.py
Outputs to ./reports/ directory.
python social_report.py --country "China"
python social_report.py --country "India"
python social_report.py --university "Monash"
python social_report.py --format html # HTML only
python social_report.py --format excel # Excel only
python social_report.py --format both # Both (default)
python social_report.py --output ./my_reports/
Report contents:
agent_scraper/
βββ scrape.py β Main scraper
βββ query.py β Database query CLI
βββ social_report.py β Report generator
βββ requirements.txt β Python dependencies
βββ data/
β βββ australian_university_agent_pages.xlsx β Source URL list
β βββ agents.db β SQLite database (created on first run)
βββ reports/ β Generated reports (created automatically)
universities β 42 Australian universities with their agent page URLs
agents β Individual agent records (company, contact, country, email, phone, website)
scrape_log β History of all scrape attempts
Some university pages use JavaScript-rendered finders (e.g. Macquarie, some QLD universities).
These wonβt be scraped by this tool as they require a browser. Pages like these will show
scrape_status = 'raw_text_fallback' β youβll need to visit those pages manually and
copy/paste agent lists, or use a Selenium-based scraper for those specific URLs.
Pages confirmed to have static HTML agent lists (best scrape results expected):