CLOCKSS is a joint venture started by libraries and publishers committed to ensuring long-term access to scholarly publications in digital format. As libraries migrate to less costly online-only subscriptions, they expect assurances from publishers that their shared investments are protected and preserved for generations to come. The CLOCKSS archive provides this assurance via its secure, closed network of web-published content that can be accessed only when a trigger event is deemed to have occurred. CLOCKSS is unique because it makes all content triggered from the archive freely available to the world.
To download a datasheet on CLOCKSS, please click here.
[edit] Technical Overview
The following is a step-by-step overview of how CLOCKSS works.
Step One
The publisher provides the CLOCKSS system access to either presentation or source files of the content. Presentation files are the HTML pages that are normally displayed to the readers of the content. Source files are minimally formatted content used internally by the publisher.
To allow CLOCKSS crawlers to access the publisher's presentation files, the publisher needs to add to its website a CLOCKSS-provided permission statement that will tell the crawlers what content is available for collection.
To allow CLOCKSS access to the publisher's source files, the publisher needs to place them on a designated FTP site.
Step Two
Special CLOCKSS boxes located at Rice, Indiana, and Stanford Universities ingest the content the publisher made available.
Step Three
The content in each CLOCKSS box must go through a verification process to confirm that their versions of the content are identical to each other. This establishes the authoritative version of the content.
Step Four
The majority of the CLOCKSS boxes are preservation machines, performing the main storage and audit functions. After the quality of the content on the ingest machines is validated, it is collected from them by the preservation CLOCKSS boxes.
Step Five
The content is then preserved through a system of audit and repair. The CLOCKSS boxes continually communicate over the Internet to audit the content they are preserving. If the content in one CLOCKSS box is damaged or incomplete, that CLOCKSS box will receive repairs of the content based on other CLOCKSS boxes' holdings and/or by referring to the publisher's original presentation files. This cooperation between the CLOCKSS boxes avoids the need to back them up individually. It also provides unambiguous reassurance that the system is performing its function and that the correct content is always available.
Step Six
When a trigger event occurs and the CLOCKSS Board decides to release the content from the archive, two things happen: a. Content is automatically migrated to the newest format. b. Content is copied from the CLOCKSS boxes to a publicly available web server at a CLOCKSS host organization (currently the EDINA Data Center, University of Edinburgh, and Stanford University).
Step Seven
The released content then can be accessed directly on the server or by going to a list of the released titles here.






