Enrichment

Enrichment improves filename quality and writes structured metadata to files using local AI. Text is extracted from documents, images, and media files, then processed by a local language model to generate semantic metadata. Nothing leaves your machine.

Tier restrictions

Enrichment respects the sensitivity tier system. This is enforced in code, not by convention.

Tier	Enrichment access
1 (RESTRICTED)	Never enters the enrichment pipeline. Blocked at the gate.
2 (SENSITIVE)	Local LLM processes extracted text only. Human confirmation required before applying results.
3 (INTERNAL)	Full enrichment. Results above confidence threshold are applied automatically.

If a file is classified as Tier 1, the enrichment module refuses to process it. There is no flag to override this. See Sensitivity Tiers for the full tier model.

Text extraction

Enrichment begins with text extraction. The extraction method depends on the file type:

File Type	Extraction Tool	What is extracted
Scanned PDF	ocrmypdf + Tesseract	OCR text from page images
Native PDF	pypdfium2	Embedded text content
Images	Tesseract	OCR text from image content
Photos	piexif	EXIF metadata (date, camera, GPS)
Audio	mutagen	ID3/metadata tags (title, artist, album)
Word documents	python-docx	Document text and metadata
Excel spreadsheets	openpyxl	Sheet names, header rows, metadata

Extracted text is passed to the inference layer. It is not stored on disk separately — it exists only in memory during processing.

Local inference

All inference runs on your machine through Ollama. The inference layer is abstracted behind inference.py, which handles model communication, prompt construction, and response parsing.

The model receives the extracted text and returns structured JSON:

{
  "date": "2024-03-15",
  "entity": "acme-corp",
  "descriptor": "quarterly-revenue-report",
  "tags": ["financial", "quarterly", "revenue"],
  "summary": "Q1 2024 revenue report for Acme Corp showing YoY growth.",
  "confidence": 0.87
}

The response provides filename tokens (date, entity, descriptor), semantic tags, a one-sentence summary, and a confidence score.

No cloud endpoints are supported. The inference interface is deliberately constrained to localhost only. This is not configurable.

Confidence routing

The confidence score determines what happens to the enrichment output:

Above threshold — results are applied automatically (Tier 3) or queued for confirmation (Tier 2)
Below threshold — the file is written to the review_queue with the LLM suggestion attached as a hint

The reviewer sees the model’s proposed filename tokens, tags, and summary alongside the file’s current name and path. They can accept, modify, or reject the suggestion.

Configuration

Enrichment settings live in fialr.toml:

[enrichment]
model = "llama3.2"
endpoint = "http://localhost:11434"
timeout = 30
confidence_threshold = 0.75

Setting	Default	Description
`model`	`llama3.2`	Ollama model name
`endpoint`	`http://localhost:11434`	Ollama API endpoint (localhost only)
`timeout`	`30`	Inference timeout in seconds per file
`confidence_threshold`	`0.75`	Minimum confidence for auto-apply

Prerequisites

Enrichment requires Ollama running locally with a model pulled:

# Install Ollama
brew install ollama

# Pull the model specified in fialr.toml
ollama pull llama3.2

# Start the Ollama server
ollama serve

fialr checks for Ollama availability before starting enrichment. If the server is not running or the configured model is not available, the command fails with a clear error. No partial processing occurs.

Running enrichment

fialr enrich ~/Documents

Enrichment processes all Tier 2 and Tier 3 files that have not yet been enriched:

jobs/2026-03-11_enrich_a1b2c3d4/
  log.json
  report.md
  checkpoint.json

Terminal output:

2,412 files eligible for enrichment.
  Tier 2: 389 (requires confirmation)
  Tier 3: 2,023
  Skipped (Tier 1): 23
  Processed: 2,412
  Auto-applied: 1,847
  Sent to review queue: 565
  Extraction failures: 12

Enrichment metadata is written to XATTRs (com.fialr.enriched_at, com.fialr.tags) and to the SQLite files table. The review_queue table receives files below the confidence threshold with the LLM suggestion stored as a hint.

What comes next

After enrichment, the corpus has complete metadata: sensitivity tiers, schema categories, content hashes, and AI-generated semantic tags. Run validation to verify integrity, or export to generate sidecar metadata files.

For the full command reference, see fialr enrich.