Skip to main content
To top Back to top back to top

1. Introduction

We have developed tools and methods for processing digital transfers. These can also be used by public sector organisations to assess their collections before transferring to our archives. A key component is the preparation of an initial test extract or copy of digital information and records that are eligible for example that have acceptable transfer characteristics, along with their accompanying metadata. Creation of a test extract is necessary for us to determine the feasibility of a full transfer.

The following activities, which help an organisation assess their files in readiness for transfer, can also be useful for an organisation to understand how well they are managing their files for digital continuity. The activities described underpin a successful transfer as they identify issues which are checked during the Transfer Preparation stage and so enable an organisation to discover potential issues in advance.

2. Understand the files

Analyse your files to identify:

  • the file formats held, specifically old or obsolete formats and unusual format modifications

  • duplicates and versions, by generating and comparing checksum values for each file

  • layers of content, such as embedded objects and

  • any system files, missing files and empty folders.

Organisations can use an automated tool like DROID (Digital Record Object Identification) for file format identification. DROID is a file format identification freeware created by the National Archives in the United Kingdom, and can be downloaded from their website.

Another tool such as Demystify can be used to understand the intended transfer file set. Demystify is way to analyse DROID CSV and Seigfried export files. Demystify breaks the export into its components and stores them within a set of tables in a SQLite database; creates additional columns to augment the output where useful; and queries the SQLite database, outputting results in a readable form useful for analysis. Demystify provides an easily readable overview and statistics of the files in the transfer. Demystify can also be used to:

  • quality check the accuracy and consistency of files and content sentencing for example showing timelines based on last modification dates and

  • locate obvious sensitive, non-business related and or draft material by ‘black listing’ potentially problematic words or characters in the file and folder names.

3. Identify what metadata is needed

At a minimum, we expects organisations to provide the mandatory metadata elements as given in the guidance Minimum requirements for metadata. Please note the following:

  • We have no fixed requirements for the schema or structure of this metadata but we prefer CSV, TXT, Excel or XML file formats

  • We require the metadata to be structured consistently

  • the file with metadata uses UTF-8 coding

  • file folder names are free of non-standard characters (only ASCII) and

  • importantly, there is someone in the organisation who understands the metadata and can assist us to understand it which will facilitate mapping to our systems.

It is recommended that organisations use the export format options of the systems in which their files are stored to export both the files and their associated metadata. The easiest option is to export all the metadata fields available in the system and then analyse those in collaboration with us to decide which fields provide context and assist with discovery. As some systems do not have metadata export functionality, organisations may need technical knowledge and or IT support to do this. We can provide some advice and support, but organisations may also need to consult the system designers or vendors.

3.1 Checksums

A checksum value is an essential metadata element that is required to ensure the integrity of files. Checksums must be generated by the organisation before the files are transferred to us, either from within the original storage system or immediately after the files are exported from it. The checksum values must be provided to us as part of the metadata describing each file, or in a separate file ideally both. Providing the checksum for each transferred file allows us to validate the file and make sure that all the files have been transferred successfully with no changes or errors introduced during the transfer.

Checksums can be generated by DROID as well as free online tools such as Free Commander (Windows) and SHA1SUM or MD5SUM (Linux). See the guidance: Checksums (17/F25).

4. Create an initial test extract or copy

So that we can determine whether a full digital transfer is feasible, the transferring organisation must identify and assess an initial test set of eligible files and their metadata as explained above. The organisation must then extract or copy these onto a removable hard drive that can be secured with encryption which we can provide if required, or arrange an alternative method for secure transport to our archives. To copy and synchronise the files, we recommends using the tool ‘rsync’ (Remote Sync) which preserves the integrity of the files and the metadata. Rsync is a utility for efficiently transferring and synchronising files across computer systems, by checking the timestamp and size of files. Archives is happy to advise transferring organisations on its use.

Note: Organisations must not delete any files or metadata at this stage of the transfer process as Archives only needs a reliable copy.

5. What happens next?

Once the initial test extract is received by us, several analytical processes both automated and manual are run over the files and metadata. This analysis identifies any content, technical, metadata and accessibility issues that may affect a full transfer and ingest into the Government Digital Archive. Archives then consolidates the analysis results in a report for discussion with the transferring organisation. The report outlines the transfer readiness of the files (ie. the quality and consistency of sentencing decisions), identifies unique file formats and potential digital preservation issues. This makes it possible to recommend that the organisation undertakes further preparation or proceeds with the digital transfer.