Guidelines for the Most Effective Methods of Choosing Digital Document Formats

Overview

Agencies use a variety of file formats to make electronic records. Over time, these formats may become outdated or useless, making it impossible to access these electronic records. Keeping an eye on and controlling how file formats are used can help keep records from getting lost. Standardizing formats also saves money and makes it easier to keep track of documents over time.

To create and manage electronic records well, agencies should limit the number of file formats they can use, keep track of how those formats are being used, and be ready to switch to more stable and widely used formats when necessary. These rules are essential for records that will be kept forever and those that will be sent to the State Archives.

This document lists the best ways to choose and keep track of file formats. It can be used to make a policy for your agency's staff. It also has a table with the preferred and acceptable formats for keeping electronic records for a long time and sending them to the State Archives.

How to choose which file format to use

Your choice of file format significantly affects how long your files will be available. To get to the information in files, you need to be able to store, read, and change them. Not all file formats are created equal. Structures with few users, require proprietary software to read, or are not well-documented are less likely to be available in the future.

These are essential things to think about when choosing file formats for long-term or permanent records for your agency. The formats in the table below were chosen based on these factors.

Access

The specifications for open source and non-proprietary formats can be found online, so anyone can make tools to read or edit them. This makes it less likely that these formats will become inaccessible in the future, which makes them better for long-term or permanent records.

These formats, like PDF/A (.pdf), JPEG2000 (.jp2), and OpenDocument Text (.odt), often have specifications that are kept up to date by a community or standards organization. Talk to your IT department about the software your agency uses and how you can use open-source or non-proprietary formats.

Famous

File formats that are often better than those that are rarely used. In the future, technical support will be more likely for file formats that more people use. Some formats, like Microsoft Word documents (.doc), are proprietary, but they are pretty safe to keep around because they are so popular.

Documented

It will be easier to keep files and get to them if they have published documentation and standards. Most of the time, documentation is more likely to be available in open formats.

Self-sufficient

Some file formats can only be opened and used with specific hardware, operating systems, or software. If a file can only be regarded or changed with particular hardware or software, it may become hard to keep it available over time. In the same way, dynamic content that needs data from outside sources to display correctly will cause long-term preservation and access problems. File formats with fewer dependencies on other things are better for long-term storage and transfer.

Supports metadata

"Self-documenting" is a term for file formats that can store metadata. This means that information about how and when a file was made can be put right into the file itself and go with it wherever it goes. This makes it easier for a government agency or the State Archives to set records in order and make them easy to find.

Uncompressed

Lossy file formats are those in which data is lost or compressed when the file is encoded. Lossless file formats do not lose data during encoding. Because of this, lossless formats tend to be bigger and cost more to store. On the other hand, lossless formats are better for long-term storage and preservation. Lossy file formats are better for accessing files for a short time. TIFF is a format model that doesn't lose any image data, while JPEG is an example of a format that loses image data because of compression.

Lack of digital property protection

Digital rights management (DRM), which limits how files can be used, is built into some file formats. One typical example is music files on the internet that can't be copied or can only be played with specific software. DRM can make it very hard to keep electronic records safe and give people access to them. DRM should not be used on documents that will be kept or moved for a long time.

Observing and changing file formats

To ensure that long-term retention files can still be accessed, you will need to keep an eye on their formats and ensure they are still supported. When a format is no longer supported, you may have to decide if you want to change your files into supported formats to keep the information in those files. Even though converting files can keep information from being lost, the process of converting files has risks and must be carefully planned.

Before you convert files, think about the three types of loss that could happen during the process:

Data

The data in the file is the essential piece of information in the record. Legally, records must be complete and reliable, meaning no data should be lost during conversion. During conversion, there is also a chance that file metadata will be lost.

Image

How a file looks may also be necessary to its value. For example, if you change a Microsoft Word document (.doc) to a Rich Text document (.rtf), you might lose the original document's look and structure. You have to decide if the appearance is essential to understanding the record and if losing the build would make the record less complete.

Connections

During conversion, it's also possible to lose the links between files or between files themselves. For example, if you change a Microsoft Excel spreadsheet (.xls) to a comma-separated file (.csv), you might lose the formulas that tell the spreadsheet what values to use. You need to figure out if the loss of these relationships would make the record less complete.

You can use several batch conversion tools to change file formats that aren't supported. If you have any questions about monitoring files and converting them, please talk to the staff at the State Archives.

Transferable and archival file types

The table below is set up by file type (word processing documents, audio, presentations, etc.). The best, acceptable, and not acceptable file formats for long-term storage and transfer to the State Archives are listed for each file type. If you have questions about a format that isn't listed, don't hesitate to contact the State Archives staff.

Ideal formats

These formats meet all conditions for long-term retention and preservation. You can send these kinds of files to the State Archives. These are also the best formats for long-term storage in an agency.

Recognizable formats

These formats meet some of the needs for long-term storage and preservation. You can send these kinds of files to the State Archives. If the agency plans to keep these files for a long time, it should talk with its IT staff and the State Archives about converting them to the best format.

Unacceptable formats

These formats aren't suitable for moving or keeping for a long time because you can't count on them to last more than five years. Many proprietary formats made with old or less popular software programs can't be transferred or kept for a long time. Electronic records that must be kept for more than five years shouldn't be kept in these formats. If you have records in any of these formats that are supposed to be sent to the State Archives or kept forever, talk to your IT staff or the staff at the State Archives about conversion options.

Back to Blog