Introduction

Last updated on 2025-10-26 | Edit this page

Overview

Questions

  • Why signature development?
  • What is PRONOM?

Objectives

  • Understand why we care about signatures
  • Gain an overview of PRONOM and its benefits.

Know what you’ve got (Why are we interested in identifying file formats?)


  • Knowing what you’ve got is a basic first step for managing digital information - whether that’s records management, managing digital continuity or Digital Preservation
  • Our aim is to keep our digital information useful and usable.
  • For ‘useful’ we usually think about managing the provenance and context of digital information - where it came from, what it’s for, what it means…. This work is often done by records managers or cataloguers.
  • To keep digital information usable, we need to understand its technical characteristics. Identifying file formats with confidence is a first step in planning to keep those records usable - ensuring access to tools or services that make the records possible to open, render or search in the future. This is at the heart of digital preservation.
  • This is detailed work. The precise format (e.g. what version of Word or PDF do we have?) can be important.

How can we identify file formats?


  • There’s not one single method

What tool was used?

  • You may simply know what tools or software were used to create your digital records (e.g. you know what camera was used by a project or what word processor was available at the time). But usually we don’t know this with any confidence, particularly if the records were created a long time ago or came from outside the organisation.

File extensions

  • File extensions can be very helpful. But they can be changed and they don’t usually tell us about specific versions.

Looking inside the files

  • We need to look inside the digital files, at the precise sequence of codes in the file. Sometimes the file format is plainly stated inside the file. More often we will be looking for characteristic patterns that point us towards an identification. These patterns are known as file format signatures. The starts and ends of files are good places to look for these patterns.
  • Some file formats have a formal specification - like a set of rules detailing how these files should be constructed. Only files that conform to the specification are ‘valid’ examples of that file format. If we have access to the specification, it can help us with signature development. The specification will define patterns that we can look for. But reality is more complicated than this: the software products used to create the files may not implement the specification correctly. We may see examples of files that seem to work just fine (i.e. they can be opened, viewed, edited using the relevant software) but if our identification relies on just the rules in the specification, we won’t get a match. It can help us to know whether or not a file is a valid example of the format - because invalid files may be at greater risk of being unusable in the future, if a new generation of the software is stricter.
  • So, when creating signatures, we should review the specification if it’s available. But we usually want to look beyond the specification. Ideally, we should also look at examples of real files, from different sources if possible.
  • Because of this variation, file format research is both an art and a science. Looking for patterns in files is a fairly structured activity. Making a judgment about how strict to be or when to be flexible is more of an art. If we make the signature too strict, we’ll fail to identify some real examples. If we make it too flexible, we’ll generate false identifications.

PRONOM


  • These file format signatures are useful to all of us. We have a central registry of signatures - this is PRONOM.
  • PRONOM is hosted and managed by The UK National Archives for the benefit of the whole digital preservation community. It’s free to use.
  • PRONOM is used by people and also by software tools to help us identify the file formats in our digital collections.
  • The National Archives didn’t research or create all the signatures in PRONOM. Since the start of the digital age, there have been a huge number of file formats in use. No one institution could possibly research them all. The file format signatures in PRONOM have been contributed by researchers from across the global digital preservation community. It’s a shared resource, created by the community for the community.
  • PRONOM is not comprehensive, far from it. Although the most common file formats are covered, at some point in your digital preservation work you will encounter a file format that doesn’t have a signature in PRONOM. Or you may find that a signature exists, but it doesn’t work well for the files in your collection.
  • This is when you will embark on researching and creating a new signature - which is what we’re going to look at in the following sections.
  • Once you’ve created a new file format signature, please contribute it to PRONOM!


Key Points
  • Know what you’ve got…
  • How we try to identify file formats.
  • Use and contribute to PRONOM!
  • It isn’t just the beginning of your PRONOM journey, it’s the beginning of your digital forensics journey!
  • Enjoy!