Back to top

The Open Archival Information System (OAIS)

OAIS (ISO 14721) is a reference model for long-term preservation of digital (and other) assets. While it does not specify the implementation or functional design, OAIS is an indispensible guide to creating an archive with an accepted set of roles, responsibilities and methods that encourage safe, long-term archival and access of critical government information.

A key component of the OAIS design is the concept of a package, the unit of information to be archived. In practice, each package will represent either a single printed document (such as a committee report) or a logically independent document set (for example, “title 3 of the state legislative code”). Packages must be fully self descriptive (all information which describes the package – metadata, presentation renditions, inventory of contents, etc. – must be inside the package itself) and self validating (the package must contain an inventory of its contents along with self-checks such as digital signatures).

In order to submit data to the archive, OAIS demands that the producer create a Submission Information Package (SIP) which is submitted to the archive. The SIP is processed by the archive into one or more Archival Information Packages (AIPs) which are stored in the archive. Consumers who wish to access data from the archive will use an access aid to locate package information. Packages disseminated by the archive to consumers are called Dissemination Information Packages (DIPs). DIPs can be anything from a simple list of search results to a ZIP file of all package contents.

Provenance information about packages identifies the chain of custody of the archival information and further records any modifications made to packages while in the archive.

OAIS benefits
The OAIS reference model:

  • Is a well known and well understood model for long-term preservation
  • Provides the discipline necessary to maintain the accessibility of packages and the interpretation of the information they contain even across changes in technology and changes in representation standards
  • Clearly defines the roles of the actors (producers, consumers, managers) who interact with the archive
  • Identifies the necessary controls required to maintain reliable archive management
  • Identifies the documentation required to communicate the archive’s purpose and interactions to interested parties
  • Carefully documents the chain of custody of archived packages to ensure that documents are not inappropriately tampered with

System Architecture

There are three major functional areas in the architecture:  Producers, The Archive CMS, and Search & Access.

Producers create or gather documents to be stored in the archive. Packages created by producers are called “SIP”s (Submission Information Packages) and contain all information known to exist about the document.

Typically, there will be many different types of producers:

  • Migration  – Scans through hard drives, file shares, web sites, etc. in the government organization and migrates existing documents into the archive. Note that migration often requires a substantial file processing framework, and so it will use a similar framework to the “file processing” component within the Archive CMS itself.
  • Web Submission – Documents created by authors throughout the organization can be submitted directly to the archive via a web submission interface. Such interfaces usually require the author to fill out a small form to ensure that the document is correctly coded inside the archive.
  • Hot Folder – Document streams with well understood and periodic production cycles will typically prefer a hot-folder approach, where new documents are dropped into a common folder and automatically pulled into the archive by ingest. Hot-folders are typically one-to-one with a particular collection of documents, so that most document metadata can be determined through automatic means.
  • Web Harvesting Producer – If your government organization has a mission to preserve documents from outside organizations as well as your own, web harvesting (reaching out to other organization web sites and harvesting their documents locally) is an appropriate solution.

Contact us to learn more about the The Open Archival Information System (OAIS)