Naming Convention

Filenames should be meaningful to both humans and machines. A file named scan001.pdf tells you nothing. A file named 2024-03-15_irs_w2-wage-statement_final.pdf tells you the date, the entity, the content, and the version — without opening it.

fialr applies a template-driven naming convention that produces filenames with consistent, semantic tokens. The convention is deterministic: given the same metadata, fialr produces the same filename.

Pattern

YYYY-MM-DD_[entity]_[descriptor]_[version].[ext]

Each token is separated by underscores. Within a token, words are separated by hyphens. This distinction is structural — underscores delimit tokens, hyphens delimit words within tokens.

Token reference

Token	Required	Format	Rules	Example
`date`	Yes	`YYYY-MM-DD`	ISO 8601. Document subject date, not file creation date. Falls back to `mtime` only when no better source exists.	`2024-03-15`
`entity`	Yes	Lowercase, hyphen-separated	Primary subject or organization. One entity per filename. No abbreviations unless the abbreviation is the canonical name (e.g., `irs`, `dmv`).	`acme-corp`, `irs`, `dr-chen`
`descriptor`	Yes	Lowercase, hyphen-separated	Semantic description of the content. Specific enough to distinguish from other files by the same entity on the same date.	`quarterly-report`, `w2-wage-statement`, `invoice-1847`
`version`	No	Lowercase, no hyphens	Present only when multiple versions of the same document coexist. Common values: `v1`, `v2`, `draft`, `final`, `signed`. Omit when there is only one version.	`v2`, `final`, `signed`
`ext`	Yes	Lowercase	MIME-derived canonical extension. Normalized to standard form (`.jpeg` becomes `.jpg`, `.tiff` becomes `.tif`).	`.pdf`, `.jpg`, `.docx`

Rules

Lowercase everywhere. No uppercase characters in any token. Acme-Corp becomes acme-corp. INVOICE.PDF becomes invoice.pdf.

Hyphens within, underscores between. Hyphens separate words inside a token (wage-statement). Underscores separate tokens from each other (irs_w2-wage-statement). Spaces are never used.

No special characters. Parentheses, brackets, ampersands, hash symbols, and other special characters are stripped or transliterated. Report (Final).pdf becomes report-final.pdf in the descriptor token.

No hashes in filenames. Content hashes belong in SQLite and XATTRs. Filenames are for human-readable semantic content. A filename like a1b2c3d4_report.pdf defeats the purpose.

Reject generic names. fialr flags files with generic names and routes them to the review queue rather than preserving meaningless tokens. The following patterns are rejected: scan001, untitled, document, new-file, copy-of, img_, dsc_, file.

Examples

2024-03-15_irs_w2-wage-statement_final.pdf
2024-01-08_acme-corp_consulting-agreement_signed.pdf
2023-11-20_dr-chen_annual-physical-results.pdf
2024-06-01_apartment-lease_renewal-terms_v2.pdf
2024-09-12_nikon-d850_yellowstone-morning-fog.jpg
2023-04-30_fidelity_401k-quarterly-statement.pdf

Each filename is parseable by splitting on underscores. The first token is always a date. The last segment before the extension is either a descriptor or a version. This predictability is what makes the convention machine-queryable.

Token sources

Each token is derived from metadata, with sources tried in priority order:

Token	Source priority
`date`	1. Extracted from document content (PDF metadata, EXIF date taken, ID3 date). 2. Inferred by local LLM from extracted text. 3. File modification time (`mtime`) as last resort.
`entity`	1. Extracted from document content (letterhead, sender, organization name). 2. Inferred by local LLM. 3. Parent directory name as heuristic fallback.
`descriptor`	1. Inferred by local LLM from extracted text (one-phrase semantic summary). 2. Derived from original filename after cleanup.
`version`	1. Detected from original filename patterns (`v2`, `final`, `_signed`). 2. Inferred from duplicate group membership (when multiple versions of the same document exist).
`ext`	1. MIME type detection (`python-magic`). 2. Original file extension, normalized to lowercase canonical form.

For Tier 1 files, only structural sources are used — no LLM inference. For Tier 2 files, LLM-derived tokens require human confirmation. For Tier 3 files, LLM-derived tokens are applied automatically above the confidence threshold. See sensitivity tiers for details.

Configuration

The naming pattern and rejected patterns are configured in fialr.toml:

[naming]
pattern = "{date}_{entity}_{descriptor}_{version}.{ext}"

reject_patterns = [
  "^scan\\d+$",
  "^untitled",
  "^document$",
  "^new-file",
  "^copy-of",
  "^img_",
  "^dsc_",
  "^file$",
]

Files matching reject_patterns in their original filename are flagged for enrichment rather than carrying the generic name forward. The descriptor token for these files must come from content extraction or LLM inference.

See the organize command reference for execution details, and the enrichment guide for how token values are derived from file content.