Executive summary
Arheko is a document intake and processing platform aimed at accounting workflows. Clients and staff can submit documents through a PWA upload surface, a desktop-style dropzone, or inbound email routed through Brevo. Behind that intake layer, a pipeline runs OCR, document-type classification, cost-object assignment, structured field extraction, filing, and human review when automation is uncertain.
The system is built as a Next.js application on Vercel with Payload CMS as the control plane for auth, tenants, document metadata, AI prompt overrides, pipeline runs, review tasks, audit events, and job queue state.
The problem
Accounting document intake rarely starts in one place. Files arrive by email, mobile photo, scan, or client upload. Someone has to open each file, identify what it is, assign it to the right cost object or client context, extract the fields their system needs, and file it — often while chasing missing attachments in threads and shared drives.
That manual loop does not scale. It also hides errors until month-end, when misclassified documents surface as reconciliation work.
The thesis
The bet is pipeline transparency over a single black-box upload.
Each document moves through named stages — ingest, OCR, classification, extraction, filing, review — with Payload tracking metadata, pipeline runs, and audit events along the way. OCR runs on RunPod with GLM-OCR; classification and extraction use an OpenAI-compatible LLM API with per-tenant prompt overrides stored in Payload. Binary files land in B2-compatible object storage; Postgres on Neon holds structured records.
Separating intake (PWA, dropzone, inbound email) from processing (async job queue) lets operators inspect intermediate state and retry individual stages without re-uploading the source file.
What I built
Intake surfaces
- PWA upload flow for mobile and browser submission
- Desktop-style dropzone for batch uploads with optional context notes
- Inbound email processing through Brevo
Next.js application (Vercel)
- Upload API
- Dashboard and browse views
- Document detail view
- Settings and admin screens
- Cron endpoints for scheduled pipeline work
Payload CMS backend
- Auth and multi-tenant model
- Document metadata, document types, and cost objects
- AI prompt overrides per processing step
- Pipeline run records, review tasks, audit events, and job queue
Processing pipeline
- Ingest and OCR dispatch with polling
- Cost-object and document-type classification
- Structured field extraction
- Filing and review queue for exceptions
External integrations
- RunPod for GLM-OCR
- OpenAI-compatible LLM API for classification and extraction
- Brevo for inbound email and SMTP
- Postgres on Neon for application data
- Backblaze B2 or compatible blob storage for file objects
Architecture
Intake (PWA / dropzone / inbound email via Brevo)
→ Next.js on Vercel
→ Payload CMS (tenants, documents, prompts, audit)
→ Job queue
→ Pipeline workers
→ RunPod GLM-OCR
→ LLM classification + extraction
→ B2 object storage
→ Review queue + filing
→ Postgres (Neon)
Current status
Active private product. Core architecture and pipeline stages are documented and implemented in the codebase. arheko.eu is the product domain referenced in integration setup. This page reflects architecture notes reviewed for the build portfolio — not a public launch announcement. Customer counts, revenue, and demo access are not documented here.