Enrichment
Enrichment improves filename quality and writes structured metadata to files using local AI. Text is extracted from documents, images, and media files, then processed by a local language model to generate semantic metadata. Nothing leaves your machine.
Tier restrictions
Section titled “Tier restrictions”Enrichment respects the sensitivity tier system. This is enforced in code, not by convention.
| Tier | Enrichment access |
|---|---|
| 1 (RESTRICTED) | Never enters the enrichment pipeline. Blocked at the gate. |
| 2 (SENSITIVE) | Local LLM processes extracted text only. Human confirmation required before applying results. |
| 3 (INTERNAL) | Full enrichment. Results above confidence threshold are applied automatically. |
If a file is classified as Tier 1, the enrichment module refuses to process it. There is no flag to override this. See Sensitivity Tiers for the full tier model.
Text extraction
Section titled “Text extraction”Enrichment begins with text extraction. The extraction method depends on the file type:
| File Type | Extraction Tool | What is extracted |
|---|---|---|
| Scanned PDF | ocrmypdf + Tesseract | OCR text from page images |
| Native PDF | pypdfium2 | Embedded text content |
| Images | Tesseract | OCR text from image content |
| Photos | piexif | EXIF metadata (date, camera, GPS) |
| Audio | mutagen | ID3/metadata tags (title, artist, album) |
| Word documents | python-docx | Document text and metadata |
| Excel spreadsheets | openpyxl | Sheet names, header rows, metadata |
Extracted text is passed to the inference layer. It is not stored on disk separately — it exists only in memory during processing.
Local inference
Section titled “Local inference”All inference runs on your machine through Ollama. The inference layer is abstracted behind inference.py, which handles model communication, prompt construction, and response parsing.
The model receives the extracted text and returns structured JSON:
{ "date": "2024-03-15", "entity": "acme-corp", "descriptor": "quarterly-revenue-report", "tags": ["financial", "quarterly", "revenue"], "summary": "Q1 2024 revenue report for Acme Corp showing YoY growth.", "confidence": 0.87}The response provides filename tokens (date, entity, descriptor), semantic tags, a one-sentence summary, and a confidence score.
No cloud endpoints are supported. The inference interface is deliberately constrained to localhost only. This is not configurable.
Confidence routing
Section titled “Confidence routing”The confidence score determines what happens to the enrichment output:
- Above threshold — results are applied automatically (Tier 3) or queued for confirmation (Tier 2)
- Below threshold — the file is written to the
review_queuewith the LLM suggestion attached as a hint
The reviewer sees the model’s proposed filename tokens, tags, and summary alongside the file’s current name and path. They can accept, modify, or reject the suggestion.
Configuration
Section titled “Configuration”Enrichment settings live in fialr.toml:
[enrichment]model = "llama3.2"endpoint = "http://localhost:11434"timeout = 30confidence_threshold = 0.75| Setting | Default | Description |
|---|---|---|
model | llama3.2 | Ollama model name |
endpoint | http://localhost:11434 | Ollama API endpoint (localhost only) |
timeout | 30 | Inference timeout in seconds per file |
confidence_threshold | 0.75 | Minimum confidence for auto-apply |
Prerequisites
Section titled “Prerequisites”Enrichment requires Ollama running locally with a model pulled:
# Install Ollamabrew install ollama
# Pull the model specified in fialr.tomlollama pull llama3.2
# Start the Ollama serverollama servefialr checks for Ollama availability before starting enrichment. If the server is not running or the configured model is not available, the command fails with a clear error. No partial processing occurs.
Running enrichment
Section titled “Running enrichment”fialr enrich ~/DocumentsEnrichment processes all Tier 2 and Tier 3 files that have not yet been enriched:
jobs/2026-03-11_enrich_a1b2c3d4/ log.json report.md checkpoint.jsonTerminal output:
2,412 files eligible for enrichment. Tier 2: 389 (requires confirmation) Tier 3: 2,023 Skipped (Tier 1): 23 Processed: 2,412 Auto-applied: 1,847 Sent to review queue: 565 Extraction failures: 12Enrichment metadata is written to XATTRs (com.fialr.enriched_at, com.fialr.tags) and to the SQLite files table. The review_queue table receives files below the confidence threshold with the LLM suggestion stored as a hint.
What comes next
Section titled “What comes next”After enrichment, the corpus has complete metadata: sensitivity tiers, schema categories, content hashes, and AI-generated semantic tags. Run validation to verify integrity, or export to generate sidecar metadata files.
For the full command reference, see fialr enrich.