Skip to content

Enrichment

Enrichment improves filename quality and writes structured metadata to files using local AI. Text is extracted from documents, images, and media files, then processed by a local language model to generate semantic metadata. Nothing leaves your machine.

Enrichment respects the sensitivity tier system. This is enforced in code, not by convention.

TierEnrichment access
1 (RESTRICTED)Never enters the enrichment pipeline. Blocked at the gate.
2 (SENSITIVE)Local LLM processes extracted text only. Human confirmation required before applying results.
3 (INTERNAL)Full enrichment. Results above confidence threshold are applied automatically.

If a file is classified as Tier 1, the enrichment module refuses to process it. There is no flag to override this. See Sensitivity Tiers for the full tier model.


Enrichment begins with text extraction. The extraction method depends on the file type:

File TypeExtraction ToolWhat is extracted
Scanned PDFocrmypdf + TesseractOCR text from page images
Native PDFpypdfium2Embedded text content
ImagesTesseractOCR text from image content
PhotospiexifEXIF metadata (date, camera, GPS)
AudiomutagenID3/metadata tags (title, artist, album)
Word documentspython-docxDocument text and metadata
Excel spreadsheetsopenpyxlSheet names, header rows, metadata

Extracted text is passed to the inference layer. It is not stored on disk separately — it exists only in memory during processing.


All inference runs on your machine through Ollama. The inference layer is abstracted behind inference.py, which handles model communication, prompt construction, and response parsing.

The model receives the extracted text and returns structured JSON:

{
"date": "2024-03-15",
"entity": "acme-corp",
"descriptor": "quarterly-revenue-report",
"tags": ["financial", "quarterly", "revenue"],
"summary": "Q1 2024 revenue report for Acme Corp showing YoY growth.",
"confidence": 0.87
}

The response provides filename tokens (date, entity, descriptor), semantic tags, a one-sentence summary, and a confidence score.

No cloud endpoints are supported. The inference interface is deliberately constrained to localhost only. This is not configurable.


The confidence score determines what happens to the enrichment output:

  • Above threshold — results are applied automatically (Tier 3) or queued for confirmation (Tier 2)
  • Below threshold — the file is written to the review_queue with the LLM suggestion attached as a hint

The reviewer sees the model’s proposed filename tokens, tags, and summary alongside the file’s current name and path. They can accept, modify, or reject the suggestion.


Enrichment settings live in fialr.toml:

[enrichment]
model = "llama3.2"
endpoint = "http://localhost:11434"
timeout = 30
confidence_threshold = 0.75
SettingDefaultDescription
modelllama3.2Ollama model name
endpointhttp://localhost:11434Ollama API endpoint (localhost only)
timeout30Inference timeout in seconds per file
confidence_threshold0.75Minimum confidence for auto-apply

Enrichment requires Ollama running locally with a model pulled:

Terminal window
# Install Ollama
brew install ollama
# Pull the model specified in fialr.toml
ollama pull llama3.2
# Start the Ollama server
ollama serve

fialr checks for Ollama availability before starting enrichment. If the server is not running or the configured model is not available, the command fails with a clear error. No partial processing occurs.


Terminal window
fialr enrich ~/Documents

Enrichment processes all Tier 2 and Tier 3 files that have not yet been enriched:

jobs/2026-03-11_enrich_a1b2c3d4/
log.json
report.md
checkpoint.json

Terminal output:

2,412 files eligible for enrichment.
Tier 2: 389 (requires confirmation)
Tier 3: 2,023
Skipped (Tier 1): 23
Processed: 2,412
Auto-applied: 1,847
Sent to review queue: 565
Extraction failures: 12

Enrichment metadata is written to XATTRs (com.fialr.enriched_at, com.fialr.tags) and to the SQLite files table. The review_queue table receives files below the confidence threshold with the LLM suggestion stored as a hint.


After enrichment, the corpus has complete metadata: sensitivity tiers, schema categories, content hashes, and AI-generated semantic tags. Run validation to verify integrity, or export to generate sidecar metadata files.

For the full command reference, see fialr enrich.