Commit Graph

9 Commits

Author SHA1 Message Date
Soldier
e4ebe2da6b Add comprehensive README documentation
Complete documentation including features, quick start guide, API reference, usage examples, architecture diagrams, and configuration options. Fixed typo in title (lightweigt → lightweight).
2025-11-16 09:22:42 +00:00
Soldier
c395a57b38 Add recurring job scheduling with frequency
Add frequency_minutes field to schedule recurring jobs. Jobs with frequency > 0 run repeatedly at specified intervals, automatically rescheduling after each execution. One-time jobs (frequency = 0) remain unchanged. Status transitions from pending to active for recurring jobs.
2025-11-16 09:17:30 +00:00
Soldier
985d340855 Add raw HTML archiving for historical re-parsing
Store complete HTML response in raw_html column before extraction. Enables re-running selectors on historical scrapes when sites change their DOM structure or CSS classes.
2025-11-16 08:43:46 +00:00
Soldier
405f9ca173 Add flexible CSS selector extraction
Replace hardcoded title extraction with user-defined CSS selectors using goquery. Users specify selector in job JSON to extract any HTML elements. Worker extracts text content plus src/href attributes. Webhook payload includes extracted content and URL.
2025-11-16 08:33:19 +00:00
Soldier
1ce45cfe97 Add URL scraping with ethical web crawling
Replace sleep with actual URL fetching. Worker scrapes HTML title from URLs, respects robots.txt, and includes proper User-Agent headers. Scraped titles stored in SQLite and sent via webhook callback.
2025-11-16 08:18:31 +00:00
Soldier
018d699e31 Add webhook callback support
Add webhook_url column to jobs table. POST /jobs endpoint accepts JSON payload with optional webhook_url. After job completion, worker POSTs to webhook with status and duration.
2025-11-16 08:01:53 +00:00
Soldier
40d194beb1 Add SQLite persistence and worker
Add jobs table with ID, status, and created_at fields. POST /jobs endpoint creates pending jobs in SQLite. Worker polls every 5s for pending jobs, processes them with 2s delay, and marks as done.
2025-11-16 07:50:59 +00:00
Soldier
c45b61ae0c Add minimal HTTP server skeleton
Initialize Go module and create basic HTTP server structure with cmd/pkg layout. Server responds on :8080 with health check endpoint.
2025-11-16 07:40:59 +00:00
Soldier
4dc07e0329 Initial commit 2025-11-16 07:30:00 +00:00