What it is
Website Evaluator AI Tool is a private website research and diagnostics pipeline.
It takes website URLs from CSV files, visits the websites with Playwright, extracts SEO and technology signals, captures screenshots, and prepares the data for AI-based visual evaluation.
The repo also includes optional registry scrapers for sources like Bizi and WKO, so contact and company metadata can be collected before the website analysis phase.
Why it exists
Website research becomes messy once the list grows past a few dozen companies.
A useful evaluation is not just “does the website look good?” It often needs:
- company contact data
- website URLs
- homepage title and meta description
- visible body text
- JSON-LD data
- CMS and ecommerce platform detection
- analytics and tag manager detection
- third-party script inventory
- sitemap coverage
- desktop and mobile screenshots
- AI-based visual scoring
- exportable tables for outreach and prioritisation
Doing this manually would be slow. Running separate one-off scripts for each step also creates duplicated browser visits and fragmented output.
This project tries to turn that process into a repeatable pipeline.
How it works
The repo is organised around phases.
Phase 1 is registry scraping. It can scrape profile data from sources like bizi.si and firmen.wko.at.
Phase 2 is website diagnostics. This phase only needs a CSV with a website URL column. The pipeline can run SEO scraping, technology analysis, script inventory, and screenshot capture from the same Playwright visit.
Phase 3 is AI analysis. This reads screenshots from a run folder and performs visual sophistication scoring.
The project intentionally uses CSV files and runs/{runId}/ folders as handoff points. That makes the pipeline more inspectable and easier to debug than a single hidden command.
Features
- Bizi profile scraping
- WKO profile scraping
- Sitemap scraping from robots and sitemap XML
- Homepage SEO metadata extraction
- Visible body text extraction
- JSON-LD extraction
- Wappalyzer-based technology detection
- WordPress detection and partial version enrichment
- Third-party script inventory
- Desktop and mobile screenshot capture
- Optional Gemini-based screenshot analysis
- CSV-based pipeline input
- Run folder outputs
- Master export joining contacts, SEO, tech, and script data
- Express API server
- BullMQ worker
- React/Vite dashboard
- Dashboard filtering, presets, lazy pagination, and filtered CSV export
Current status
This is still in progress.
The repo docs mark several areas as active:
- Bizi profile scraping
- WKO profile scraping
- sitemap scraping
- SEO scraping
- tech analysis
- screenshot capture
- AI analysis
- API server
- worker
- dashboard
The docs also explicitly mark some work as deferred:
- AI analysis at scale
- screenshots for visual scoring at scale
- WKO Austria scale-up
- larger Bizi retail list processing
- sitemap URL classification
- multi-page crawling
- dashboard polish and run comparison
So the project is best described as an active private internal tool, not a finished public SaaS.
What exists today
The repo contains:
- TypeScript pipeline scripts
- Playwright-based scraping
- Bizi and WKO scrapers
- SEO scraper
- sitemap scraper
- Wappalyzer technology analysis
- screenshot capture
- Gemini-based analysis service
- Express API
- BullMQ worker
- Docker Compose setup
- React/Vite dashboard
- progress documentation
- canonical architecture documentation
- generated historical run/output artifacts
The docs also include a project snapshot with completed and partial data runs, including Bizi contacts, standalone SEO scrape output, partial pipeline output, and partial sitemap output.
What does not exist yet
There is no public demo.
There is no hosted product URL documented.
There is no pricing page.
There is no documented customer usage.
There is no public launch.
The Google Custom Search website discovery path is retired, so the current pipeline expects website URLs to be provided directly in CSV input or collected through source-specific scrapers first.
The AI scoring path exists, but the docs say AI analysis at scale is still deferred.