Skip to content

Classification

Classification reads the manifest produced by inventory and assigns two things to every file: a sensitivity tier that controls what operations are permitted, and a schema category that proposes where the file belongs.

No file content is read during classification. All signals are structural.

Classification uses only external characteristics of a file:

  • Filename patterns — keywords like passport, tax-return, bank-statement trigger higher sensitivity
  • File extensions.key, .pem, .pfx are treated as restricted by default
  • Directory names — paths containing medical/, legal/, financial/ influence tier assignment
  • MIME types — detected during inventory, used to disambiguate ambiguous extensions

The classifier never opens a file. It never reads file content, parses document text, or inspects binary data. This is a deliberate constraint: Tier 1 files must be classifiable without any content access.


Every file receives exactly one tier. The tier is a gate that controls all downstream operations.

TierLabelAI AccessPermitted Operations
1RESTRICTEDNeverManual only. Encrypted vault. Review queue before any operation.
2SENSITIVELocal LLM on extracted textMove/rename with human confirmation
3INTERNALFull local enrichmentAutomated above confidence threshold

Tier assignment is conservative. When signals conflict, the higher (more restrictive) tier wins. A file in a financial/ directory with a .pdf extension receives Tier 1 even if nothing in the filename suggests sensitivity.

The full tier model is documented in Sensitivity Tiers.


Alongside the tier, the classifier proposes a schema category for each file. Categories are derived from schema.yaml and assigned using:

  • Path heuristics — directory structure maps to schema categories
  • Extension clusters — groups of related extensions (.doc, .docx, .odt) suggest document categories
  • Date patterns — filenames or paths containing date strings (e.g., 2024-03-tax) inform category placement
  • Entity name extraction — recognized entity names in paths or filenames (e.g., acme-corp-invoice.pdf) are extracted as category metadata

Category suggestions are proposals, not commitments. They feed into the planning phase where they are reviewed before any file moves.


Every classification receives a confidence score between 0.0 and 1.0. The score reflects the strength and consistency of the structural signals.

  • High confidence (above threshold): tier and category are applied without review
  • Low confidence (below threshold): the file is written to the review queue with the proposed classification as a suggestion
  • Tier 1 candidates: always written to the review queue regardless of confidence

The confidence threshold is configurable in fialr.toml:

[classifier]
confidence_threshold = 0.7

Files that require human review are written to the review_queue table in SQLite. This includes:

  • All Tier 1 candidates (mandatory review before any operation)
  • Files with confidence scores below the threshold
  • Files with conflicting structural signals

The review queue is not a suggestion. Tier 1 files cannot proceed to any automated operation without explicit human review. The executor enforces this.


Terminal window
fialr classify ~/Documents

Classification reads the most recent manifest for the target directory and writes its output to classifier_output.csv in the job directory:

jobs/2026-03-11_classify_a1b2c3d4/
classifier_output.csv
log.json
report.md

The CSV contains one row per file with columns for path, content hash, assigned tier, proposed category, confidence score, and the signals that contributed to the classification.

Terminal output summarizes the distribution:

2,847 files classified.
Tier 1 (RESTRICTED): 23
Tier 2 (SENSITIVE): 412
Tier 3 (INTERNAL): 2,412
Review queue: 89

Classification output feeds directly into planning, where tier assignments and category suggestions are used to generate a dry-run reorganization plan.

For the full command reference, see fialr classify.