IN PROGRESS Structured extraction

BelegLotse

Extracts receipts and invoices into DATEV structure — §14 UStG-validated, optionally 100% local.

Live-Demo (soon) Code (soon)

Problem & context

Typing in receipts is slow and error-prone

Every receipt must be captured, checked and exported to DATEV. An extractor that recognises fields, validates against §14 UStG and exports cleanly — optionally cloud-free — speeds up bookkeeping.

Solution

OCR + LLM + validation instead of typing

Schema-bound structured output, mandatory-field validation, DATEV export.

Screenshot / demo GIF coming

Architecture

Clean Architecture, four layers

domain

Receipt fields & §14 UStG rules

application

Extract → validate → export

infrastructure

OCR, Mistral/Ollama, DATEV mapper

api

FastAPI + HTMX upload

Process history

From plan to deploy — six phases

  1. 01

    Setup & architecture

    IN PROGRESS

    Clean-arch skeleton, Docker, CI. ADR-0001: local option (Ollama) for sensitive receipts.

  2. 02

    OCR & preprocessing

    PLANNED

    Scan/PDF → text, layout detection.

  3. 03

    Field extraction

    PLANNED

    Structured output (amount, VAT, date, supplier) via LLM.

  4. 04

    Validation

    PLANNED

    §14 UStG mandatory fields, totals check, BGB §288.

  5. 05

    DATEV export & eval

    PLANNED

    DATEV format, measure field precision/recall.

  6. 06

    Deploy & docs

    PLANNED

    Docker deploy, GoBD-compliant storage, README, ADRs.

Results

Made measurable

Field precision
Field recall
Validation rate

Will be filled with real numbers after the eval phase — and then feeds into the CV.

Stack & compliance

Python 3.12FastAPIOCRMistral / OllamaDATEVDocker

Receipts contain personal & tax-relevant data → optionally 100% local (Ollama), GoBD-compliant storage, retention per AO §147. Disclaimer: no tax advice (StBerG).

BelegLotse live demo

← All projects