Commit Graph

3 Commits

Author SHA1 Message Date
Soldier
405f9ca173 Add flexible CSS selector extraction
Replace hardcoded title extraction with user-defined CSS selectors using goquery. Users specify selector in job JSON to extract any HTML elements. Worker extracts text content plus src/href attributes. Webhook payload includes extracted content and URL.
2025-11-16 08:33:19 +00:00
Soldier
1ce45cfe97 Add URL scraping with ethical web crawling
Replace sleep with actual URL fetching. Worker scrapes HTML title from URLs, respects robots.txt, and includes proper User-Agent headers. Scraped titles stored in SQLite and sent via webhook callback.
2025-11-16 08:18:31 +00:00
Soldier
40d194beb1 Add SQLite persistence and worker
Add jobs table with ID, status, and created_at fields. POST /jobs endpoint creates pending jobs in SQLite. Worker polls every 5s for pending jobs, processes them with 2s delay, and marks as done.
2025-11-16 07:50:59 +00:00