Classification

Classification reads the manifest produced by inventory and assigns two things to every file: a sensitivity tier that controls what operations are permitted, and a schema category that proposes where the file belongs.

No file content is read during classification. All signals are structural.

Structural signals

Classification uses only external characteristics of a file:

Filename patterns — keywords like passport, tax-return, bank-statement trigger higher sensitivity
File extensions — .key, .pem, .pfx are treated as restricted by default
Directory names — paths containing medical/, legal/, financial/ influence tier assignment
MIME types — detected during inventory, used to disambiguate ambiguous extensions

The classifier never opens a file. It never reads file content, parses document text, or inspects binary data. This is a deliberate constraint: Tier 1 files must be classifiable without any content access.

Sensitivity tiers

Every file receives exactly one tier. The tier is a gate that controls all downstream operations.

Tier	Label	AI Access	Permitted Operations
1	RESTRICTED	Local AI (Ollama). Cloud requires two-step confirmation.	Review queue before any operation. Encrypted vault.
2	SENSITIVE	Local AI (default). Cloud opt-in via configured provider.	Move/rename with human confirmation
3	INTERNAL	Full enrichment via configured provider (local or cloud).	Automated above confidence threshold

Tier assignment is conservative. When signals conflict, the higher (more restrictive) tier wins. A file in a financial/ directory receives Tier 2 by default in schema.yaml (Tier 1 is reserved for medical/, personal/, and identity/ directories). Explicit filename patterns like passport or ssn can escalate any file to Tier 1 regardless of directory.

The full tier model is documented in Sensitivity Tiers.

Category suggestion

Alongside the tier, the classifier proposes a schema category for each file. Categories are derived from schema.yaml and assigned using:

Path heuristics — directory structure maps to schema categories
Extension clusters — groups of related extensions (.doc, .docx, .odt) suggest document categories
Date patterns — filenames or paths containing date strings (e.g., 2024-03-tax) inform category placement
Entity name extraction — recognized entity names in paths or filenames (e.g., acme-corp-invoice.pdf) are extracted as category metadata

Category suggestions are proposals, not commitments. They feed into the planning phase where they are reviewed before any file moves.

Confidence scoring

Every classification receives a confidence score between 0.0 and 1.0. The score reflects the strength and consistency of the structural signals.

High confidence (above threshold): tier and category are applied without review
Low confidence (below threshold): the file is written to the review queue with the proposed classification as a suggestion
Tier 1 candidates: always written to the review queue regardless of confidence

The confidence threshold is configurable in fialr.toml:

[enrichment]
confidence_threshold = 0.7

Review queue

Files that require human review are written to the review_queue table in SQLite. This includes:

All Tier 1 candidates (mandatory review before any operation)
Files with confidence scores below the threshold
Files with conflicting structural signals

The review queue is not a suggestion. Tier 1 files cannot proceed to any automated operation without explicit human review. The executor enforces this.

Running classification

Classification runs automatically as part of fialr scan and fialr process. The standalone fialr classify command is still available for backward compatibility:

fialr scan ~/Documents        # includes classification
fialr process ~/Documents     # includes scan + classification
fialr classify ~/Documents    # standalone (backward compat)

Classification reads the most recent manifest for the target directory and writes its output to classifier_output.csv in the job directory:

jobs/2026-03-11_classify_a1b2c3d4/
  classifier_output.csv
  log.json
  report.md

The CSV contains one row per file with columns for path, content hash, assigned tier, proposed category, confidence score, and the signals that contributed to the classification.

Terminal output summarizes the distribution:

2,847 files classified.
  Tier 1 (RESTRICTED):  23
  Tier 2 (SENSITIVE):  412
  Tier 3 (INTERNAL): 2,412
  Review queue:  89

What comes next

Classification output feeds directly into planning, where tier assignments and category suggestions are used to generate a dry-run reorganization plan.

For the full command reference, see fialr classify.