BuildWithMatija
Back to Builds
ToolActivePrivate

Website Evaluator AI Tool

A private pipeline for turning website lists into SEO, tech, screenshot, and AI evaluation data.

  • TypeScript
  • Node.js
  • Playwright
  • Express
  • BullMQ
  • Redis
  • Wappalyzer
  • Google Gemini
  • React
  • Vite
  • Zustand
  • Cheerio
  • CSV
  • Docker Compose
GitHub
Problem
Evaluating many company websites manually is slow and inconsistent. For outreach or market research, the useful information is spread across business directories, homepages, SEO metadata, technology fingerprints, scripts, screenshots, and contact fields. A simple spreadsheet is not enough once the goal becomes scoring, filtering, and prioritising hundreds or thousands of websites.
Thesis
The bet is to turn website evaluation into a repeatable pipeline: collect company/contact data, visit each website once with Playwright, extract SEO and technology signals, capture screenshots, and then enrich the result with AI scoring when screenshots are available. Instead of one fully automated black box, the repo uses CSV handoff and run folders so each phase can be inspected, resumed, debugged, or improved independently.
Validation
The private repo documents active Bizi and WKO profile scrapers, sitemap scraping, SEO scraping, Wappalyzer-based tech analysis, screenshots, AI analysis, an Express + BullMQ API, and a dashboard. Current docs include project snapshots with completed Bizi contact scraping, completed standalone SEO scraping, partial pipeline runs, partial sitemap output, and deferred AI analysis at scale. There is no documented public demo, hosted product, pricing, customer usage, or production deployment.
Proof points
  • Private GitHub repository: https://github.com/matija2209/website-evaluator-ai-tool
  • README documents scraping company directory/profile sources such as bizi.si and firmen.wko.at.
  • README documents sitemap scraping, SEO metadata extraction, technology-stack detection, screenshots, and AI-based website analysis.
  • README documents an Express + BullMQ API, worker, and dashboard.
  • Canonical reference documents a CSV handoff model across registry scraping, website diagnostics, screenshots, and optional AI scoring.
  • Progress notes document a bulk pipeline re-run with 1,585 URLs, 1,262 successful scrapes, 1,077 tech/script results, and a 1,477-row master export.
  • Progress notes mark AI analysis at scale, screenshots for visual scoring, WKO scale-up, and multi-page crawling as deferred.
  • No public demo, hosted app, pricing page, or external customer usage metrics are documented.
Audience
  • Agency operators researching potential clients
  • Developers building website audit and outreach pipelines
  • Teams that need to evaluate many business websites from CSV input
  • People who want to combine directory scraping, SEO metadata, tech detection, screenshots, and AI scoring

What it is

Website Evaluator AI Tool is a private website research and diagnostics pipeline.

It takes website URLs from CSV files, visits the websites with Playwright, extracts SEO and technology signals, captures screenshots, and prepares the data for AI-based visual evaluation.

The repo also includes optional registry scrapers for sources like Bizi and WKO, so contact and company metadata can be collected before the website analysis phase.

Why it exists

Website research becomes messy once the list grows past a few dozen companies.

A useful evaluation is not just “does the website look good?” It often needs:

  • company contact data
  • website URLs
  • homepage title and meta description
  • visible body text
  • JSON-LD data
  • CMS and ecommerce platform detection
  • analytics and tag manager detection
  • third-party script inventory
  • sitemap coverage
  • desktop and mobile screenshots
  • AI-based visual scoring
  • exportable tables for outreach and prioritisation

Doing this manually would be slow. Running separate one-off scripts for each step also creates duplicated browser visits and fragmented output.

This project tries to turn that process into a repeatable pipeline.

How it works

The repo is organised around phases.

Phase 1 is registry scraping. It can scrape profile data from sources like bizi.si and firmen.wko.at.

Phase 2 is website diagnostics. This phase only needs a CSV with a website URL column. The pipeline can run SEO scraping, technology analysis, script inventory, and screenshot capture from the same Playwright visit.

Phase 3 is AI analysis. This reads screenshots from a run folder and performs visual sophistication scoring.

The project intentionally uses CSV files and runs/{runId}/ folders as handoff points. That makes the pipeline more inspectable and easier to debug than a single hidden command.

Features

  • Bizi profile scraping
  • WKO profile scraping
  • Sitemap scraping from robots and sitemap XML
  • Homepage SEO metadata extraction
  • Visible body text extraction
  • JSON-LD extraction
  • Wappalyzer-based technology detection
  • WordPress detection and partial version enrichment
  • Third-party script inventory
  • Desktop and mobile screenshot capture
  • Optional Gemini-based screenshot analysis
  • CSV-based pipeline input
  • Run folder outputs
  • Master export joining contacts, SEO, tech, and script data
  • Express API server
  • BullMQ worker
  • React/Vite dashboard
  • Dashboard filtering, presets, lazy pagination, and filtered CSV export

Current status

This is still in progress.

The repo docs mark several areas as active:

  • Bizi profile scraping
  • WKO profile scraping
  • sitemap scraping
  • SEO scraping
  • tech analysis
  • screenshot capture
  • AI analysis
  • API server
  • worker
  • dashboard

The docs also explicitly mark some work as deferred:

  • AI analysis at scale
  • screenshots for visual scoring at scale
  • WKO Austria scale-up
  • larger Bizi retail list processing
  • sitemap URL classification
  • multi-page crawling
  • dashboard polish and run comparison

So the project is best described as an active private internal tool, not a finished public SaaS.

What exists today

The repo contains:

  • TypeScript pipeline scripts
  • Playwright-based scraping
  • Bizi and WKO scrapers
  • SEO scraper
  • sitemap scraper
  • Wappalyzer technology analysis
  • screenshot capture
  • Gemini-based analysis service
  • Express API
  • BullMQ worker
  • Docker Compose setup
  • React/Vite dashboard
  • progress documentation
  • canonical architecture documentation
  • generated historical run/output artifacts

The docs also include a project snapshot with completed and partial data runs, including Bizi contacts, standalone SEO scrape output, partial pipeline output, and partial sitemap output.

What does not exist yet

There is no public demo.

There is no hosted product URL documented.

There is no pricing page.

There is no documented customer usage.

There is no public launch.

The Google Custom Search website discovery path is retired, so the current pipeline expects website URLs to be provided directly in CSV input or collected through source-specific scrapers first.

The AI scoring path exists, but the docs say AI analysis at scale is still deferred.

Related services

  • AI systems & automation
  • Internal tools

Working through something similar?

If your company has a workflow, content system, or internal process that needs to become real software, this is the kind of work I can help with.

Get in touch
Build with Matija logo

Build with Matija

Modern websites, content systems, and AI workflows built for long-term growth.

Services

  • Headless CMS Websites
  • Next.js & Headless CMS Advisory
  • AI Systems & Automation
  • Website & Content Audit

Resources

  • Case Studies
  • How I Work
  • Blog
  • Topics
  • CMS Hub
  • E-commerce Hub
  • B2B Website Strategy
  • Dashboard

Headless CMS

  • Payload CMS Developer
  • CMS Migration
  • Multi-Tenant CMS
  • Payload vs Sanity
  • Payload vs WordPress
  • Payload vs Contentful

Get in Touch

Ready to modernize your stack? Let's talk about what you're building.

Book a discovery callContact me →
© 2026Build with Matija•All rights reserved•Privacy Policy•Terms of Service
BuildWithMatija
Get In Touch