Content Hash as Identity
Filenames change. Paths change. A file gets renamed, moved to a different directory, copied to a backup drive. None of these mutations change what the file is. The content is the identity.
fialr computes a cryptographic hash of every file’s content and uses that hash as the file’s canonical identifier. Everything else — the filename, the path, the directory hierarchy — is mutable metadata attached to that identifier.
Hash algorithms
Section titled “Hash algorithms”| Algorithm | Role | Rationale |
|---|---|---|
| BLAKE3 | Primary identifier | Fast, cryptographically secure, streaming-capable. Used as the canonical content hash in SQLite, XATTRs, and all internal references. |
| SHA256 | Secondary / archival | Widely supported by external tools. Stored alongside BLAKE3 for cross-tool compatibility and long-term archival verification. |
| xxhash | Excluded | Not cryptographically secure. Collisions are possible at scale. Unsuitable for identity or integrity verification. |
Both BLAKE3 and SHA256 are computed during inventory. The BLAKE3 hash is the primary key in the files table. SHA256 is stored as an additional column for interoperability.
Why content, not filenames
Section titled “Why content, not filenames”A filename is a label. It can be changed by the user, by an application, by a sync conflict, or by fialr itself during renaming. A path is a location. It changes when the file moves.
The content hash is invariant under all of these operations:
| Operation | Filename | Path | Content hash |
|---|---|---|---|
| Rename | Changes | Same | Same |
| Move to different directory | Same | Changes | Same |
| Copy to new location | Same or different | Changes | Same |
| Edit file content | Same | Same | Changes |
This model has direct consequences:
Renaming does not change identity. When fialr applies its naming convention to a file, the content hash stays the same. The old filename is recorded in XATTRs and SQLite as provenance metadata. The file’s identity is unaffected.
Moving does not change identity. Reorganizing files into a new directory structure updates path metadata but does not alter the content hash. All references by hash remain valid.
Deduplication groups by hash. Two files with different names, in different directories, with different creation dates, are the same file if they have the same content hash. fialr groups them, selects a canonical copy, and moves non-canonical copies to a staging directory with full provenance metadata.
Where hashes are stored
Section titled “Where hashes are stored”Hashes are stored in two locations with different roles:
| Location | Role | Platform |
|---|---|---|
| SQLite database | Source of truth | All platforms |
| Extended attributes (XATTRs) | Cache layer | macOS (com.fialr.hash), Linux (user.fialr.hash), Windows (NTFS ADS) |
SQLite is authoritative. The files table uses the BLAKE3 hash as its primary key. All queries, dedup operations, and integrity checks reference SQLite. If there is a conflict between SQLite and XATTRs, SQLite wins.
XATTRs are a derived cache. Extended attributes are written alongside SQLite for fast, filesystem-level access. They allow other tools to read a file’s hash without querying the database. XATTRs are rebuilt from SQLite, never the reverse.
XATTR degradation policy
Section titled “XATTR degradation policy”Not all filesystems support extended attributes. FAT32, exFAT, and some network mounts do not.
When XATTRs are unsupported, fialr writes to SQLite only. The skip is logged. No error is raised. No functionality is lost. The database remains the complete record.
This is a design choice, not a workaround. The system must function identically whether XATTRs are available or not. SQLite is the contract. XATTRs are a convenience.
Integrity verification
Section titled “Integrity verification”fialr provides three verification modes through the validate command:
| Mode | Scope | Use case |
|---|---|---|
spot | Random sample of files | Quick confidence check. Suitable for routine verification. |
manifest | All files listed in a job manifest | Post-operation verification. Confirms that a specific job did not corrupt any files. |
full | Every file in the database | Complete corpus integrity audit. Recomputes all hashes and compares against stored values. |
In all modes, verification recomputes the BLAKE3 hash from the file’s current content and compares it against the stored hash in SQLite. A mismatch means the file content has changed since it was last indexed — either legitimately (the file was edited) or due to corruption.
Mismatches are reported with the file path, expected hash, actual hash, and the job that last operated on the file. The decision to act on a mismatch is left to the operator.