Website Evaluator AI Tool | Builds

What it is

Website Evaluator AI Tool is a private website research and diagnostics pipeline.

It takes website URLs from CSV files, visits the websites with Playwright, extracts SEO and technology signals, captures screenshots, and prepares the data for AI-based visual evaluation.

The repo also includes optional registry scrapers for sources like Bizi and WKO, so contact and company metadata can be collected before the website analysis phase.

Why it exists

Website research becomes messy once the list grows past a few dozen companies.

A useful evaluation is not just “does the website look good?” It often needs:

company contact data
website URLs
homepage title and meta description
visible body text
JSON-LD data
CMS and ecommerce platform detection
analytics and tag manager detection
third-party script inventory
sitemap coverage
desktop and mobile screenshots
AI-based visual scoring
exportable tables for outreach and prioritisation

Doing this manually would be slow. Running separate one-off scripts for each step also creates duplicated browser visits and fragmented output.

This project tries to turn that process into a repeatable pipeline.

How it works

The repo is organised around phases.

Phase 1 is registry scraping. It can scrape profile data from sources like bizi.si and firmen.wko.at.

Phase 2 is website diagnostics. This phase only needs a CSV with a website URL column. The pipeline can run SEO scraping, technology analysis, script inventory, and screenshot capture from the same Playwright visit.

Phase 3 is AI analysis. This reads screenshots from a run folder and performs visual sophistication scoring.

The project intentionally uses CSV files and runs/{runId}/ folders as handoff points. That makes the pipeline more inspectable and easier to debug than a single hidden command.

Features

Bizi profile scraping
WKO profile scraping
Sitemap scraping from robots and sitemap XML
Homepage SEO metadata extraction
Visible body text extraction
JSON-LD extraction
Wappalyzer-based technology detection
WordPress detection and partial version enrichment
Third-party script inventory
Desktop and mobile screenshot capture
Optional Gemini-based screenshot analysis
CSV-based pipeline input
Run folder outputs
Master export joining contacts, SEO, tech, and script data
Express API server
BullMQ worker
React/Vite dashboard
Dashboard filtering, presets, lazy pagination, and filtered CSV export

Current status

This is still in progress.

The repo docs mark several areas as active:

Bizi profile scraping
WKO profile scraping
sitemap scraping
SEO scraping
tech analysis
screenshot capture
AI analysis
API server
worker
dashboard

The docs also explicitly mark some work as deferred:

AI analysis at scale
screenshots for visual scoring at scale
WKO Austria scale-up
larger Bizi retail list processing
sitemap URL classification
multi-page crawling
dashboard polish and run comparison

So the project is best described as an active private internal tool, not a finished public SaaS.

What exists today

The repo contains:

TypeScript pipeline scripts
Playwright-based scraping
Bizi and WKO scrapers
SEO scraper
sitemap scraper
Wappalyzer technology analysis
screenshot capture
Gemini-based analysis service
Express API
BullMQ worker
Docker Compose setup
React/Vite dashboard
progress documentation
canonical architecture documentation
generated historical run/output artifacts

The docs also include a project snapshot with completed and partial data runs, including Bizi contacts, standalone SEO scrape output, partial pipeline output, and partial sitemap output.

What does not exist yet

There is no public demo.

There is no hosted product URL documented.

There is no pricing page.

There is no documented customer usage.

There is no public launch.

The Google Custom Search website discovery path is retired, so the current pipeline expects website URLs to be provided directly in CSV input or collected through source-specific scrapers first.

The AI scoring path exists, but the docs say AI analysis at scale is still deferred.