PDF icon. Download S6 PDF (2.22MB)  Get Acrobat Reader.

 DIGITISATION STANDARD

> APPENDIX 6: FILE-NAMING METADATA RECOMMENDATIONS6

This appendix is for guidance. It outlines a number of best practices in relation to determining a file naming protocol, particularly for digital images. It relates to Requirement 2.2: All digitised images MUST be assigned metadata to document digitising processes and to support ongoing business processes.

A file naming scheme SHOULD be established prior to capture. The development of a file naming scheme should take into account whether the identifier requires machine or human-indexing (or both — in which case, the image may have multiple identifiers). File names can either be meaningful (such as the adoption of an existing identification scheme which correlates the digital file with the source material), or non-descriptive (such as a sequential numerical string). Meaningful file names contain metadata that is self-referencing; non-descriptive file names are associated with metadata stored elsewhere that serves to identify the file. In general, smaller-scale projects may design descriptive file names that facilitate browsing and retrieval; large-scale projects may use machine-generated names and rely on a database for sophisticated searching and retrieval of associated metadata.

In general, file names SHOULD:


Directory structure

Regardless of file name, files will likely be organised in some kind of file directory system that will link to metadata stored elsewhere in a database. Production master files might be stored separately from derivative files, or directories MAY have their own organisation independent of the image files, such as folders arranged by date or classification structure, or they MAY replicate the physical or logical organisation of the originals being scanned.

The files themselves can also be organised solely by directory structure and folders rather than embedding meaning in the file name. This approach generally works well for multi-page items. Images are uniquely identified and aggregated at the level of the logical object (i.e. a document, a record, a file/folder, etc.), which requires that the folders or directories be named descriptively. The file names of the individual images themselves are unique only within each directory, but not across directories. For example, book 0001 contains image files 001.tif, 002.tif, and 003.tif. Book 0002 contains image files 001.tif, 002.tif, and 003.tif. The danger with this approach is that if individual images are separated from their parent directory, they will be indistinguishable from images in a different directory.


Versions

For various reasons, a single scanned object may have multiple but differing versions associated with it (for example, the same image prepared for different output intents; versions with additional edits; layers, or alpha channels that are worth saving; versions scanned on different scanners, scanned from different original media, or scanned at different times by different scanner operators). Ideally, the description and intent of different versions should be reflected in the metadata; but, if the naming convention is consistent, distinguishing versions in the file name will allow for quick identification of a particular image. Like derivative files, this often implies the application of a qualifier to part of the file name. The reason to use qualifiers rather than entirely new names is to keep all versions associated with a logical object under the same identifier. An approach to naming versions should be well thought out; adding 001, 002 to the base file name to indicate different versions is an option; however, if 001 and 002 already denote page numbers, a different approach will be required.


Naming derivative files

The file naming scheme should also take into account the creation of derivative image files made from the production master files. In general, derivative file names are inherited from the production masters, usually with a qualifier added on to distinguish the role of the derivative from other files (i.e. "p" for published version, "t" for thumbnail). Derived files usually imply a change in image dimensions, image resolution, and/or file format from the production master. Derivative file names do not have to be descriptive as long as they can be linked back to the production master file.

For derivative files intended primarily for web display, one consideration for naming is that images may need to be cited by users in order to retrieve other higher-quality versions. If so, the derivative file name should contain enough descriptive or numerical meaning to allow for easy retrieval of the original or other digital versions.



6. This section is sourced from: US National Archives and Records Administration, 'Technical Guidelines for Digitizing Archival Materials for Electronic Access: Creation of Production Master Files - Raster Images', June 2004.