
Transcription
White PaperPDF/A – the standard forlong-term archiving What are the advantages of the PDF/A Standard? What is the PDF/A Standard? What do PDF/A-1a, PDF/A-1b, PDF/A2 mean? How is the PDF/A Standard implemented? Is PDF/A the solution for long-term archiving?PDF/ACompetence CenterVersion: 2.4Date:May 20, 2009
IndexIndex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3What are the advantages of the PDF/A Standard?. . . . . . . . . . . . . . . . . . . . 3The PDF/A Standard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Goals of PDF/A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5The distinction between PDF and PDF/A. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5PDF/A, A-1a, A-1b, A-2 „Babylon“. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Use of the PDF/A Standard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Where do I get a copy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .To whom is the Standard addressed?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .What tools are available? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .PDF/A as a component of a comprehensive long-term archivingconcept. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .77777Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8PDF/A – the archiving standard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8What is the market reaction?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8PDF/A as a long-term strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9PDF/A Competence Center. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9PDF Tools AG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
BackgroundPDF/A – the standard for long-term archivingIntroductionOn September 28, 2005 the International Standards Organization (ISO) formulated anew Standard governing archiving of electronic documents – the official formulation is:ISO-19005-1 - Document management - Electronic document file format for long-termpreservation - Part 1: Use of PDF 1.4 (PDF/A-1).The Standard is the result of more than 36 months of collaboration among companies and organizations around the world.PDF/A was elevated to thestatus of international Standardin 2005.Renowned organisations andmanufacturers, as well asprofessional users, wereinvolved.The initial impetus for this initiative occurred in May 2002 in the USA. The stated goalwas to create a standardized format for electronically archived documents. The Association for Information and Image Management (AIIM), the National Printing EquipmentAssociation (NPES) and the administrative body for the US courts were all involved. Thekick-off meeting took place in October 2002. Renowned PDF manufacturers participated, including: Adobe Systems, the Library of Congress, Surety Inc., Quality Associates Inc., Appligent, Merck, EMC, PDF Sages, and the National Archives & RecordsAdministration (NARA). These were joined at a later time by others, including Xerox,Honeywell, EDS and Glaxo Smith Kline, among others.The founders of the project put together a first version and submitted their recommendation to the ISO in order to have it registered as an international Standard. The projectwas referred by the ISO to a Technical Committee designated TC 171 (Document Management Applications). This committee is composed of 15 member states and each hasone vote, which is cast by their respective representatives. The committee is supplemented by an advisory commission representing another 21 countries. The Standardwas improved over multiple stages until it was finally approved in September 2005.What are the advantages of the PDF/A Standard?Almost every country uses its own format to archive documents. Traditional archiving methods such as paper, microfilm and microfiche can be archived for a long time and can bereproduced over the long-term. These formats, however, lack the advantages of digital technology: large documents cannot be quickly and easily sent around the world. Additionally, itis almost impossible to search documents archived in the traditional way for specific information. Many institutions chose the TIFF format for their first electronic archives. This formatguarantees long-term reproducibility, and the image format TIFF is established and can easily and quickly be sent to networked institutions around the world. Freely searching forinformation in the TIFF format, however, remains problematic.We were able to produce theelectronic archive thanks toinnovative technologies. 2009 PDF Tools AG – Premium PDF TechnologyWhite Paper – PDF/A, Page 3 of 10PDF offers more advantagesthan the TIFF format.
This was the reason why the PDF format began to be considered. There are many reasonswhy PDF is the most attractive alternative: PDF includes structured objects (text, vector graphics, rastered images). This allowsefficient search queries to be performed through the entire data archive. TIFF is also arastered format. In order to be able to perform a full-text search, the TIFF documentmust first be prepared by optical text recognition software (OCR). PDF can be efficiently and compactly compressed. In comparison to an equivalent TIFFfile, a PDF file requires only a fraction of the memory space. Moreover, the quality isalmost always better. The smaller file size is extremely advantageous for electronic filetransfer in particular, such as email attachments, FTP, etc. Metadata (author, topic, content, keywords, date created, date modified, publisher, etc.)are embedded directly in the PDF file using a standardized format (XMP). This meansthat the metadata can be automatically and systematically amended without manualsteps. The PDF format is generally conceived such that it is not tied to any one particulardevice (raster resolution, color system, etc.). Only once the document is displayed is thepage content represented in the viewer or on the printer; this process is called “rendering”. This means that PDF documents adapt to the technological development of theoutput devices (printer, monitor, scanner, etc.) and remain up-to-date for years aftertheir creation.Over the past fifteen years, Adobe Systems, the author of the PDF format, has published atotal of eight versions of its PDF Reference Manual. In each revision, the PDF format wasexpanded to include new functionalities and existing ones were overhauled. For this reason, it was unavoidable that an enduring, stable and internationally valid standard forlong-term preservation would be based on the Adobe PDF format. The result of this development is the PDF/A Standard. 2009 PDF Tools AG – Premium PDF TechnologyWhite Paper – PDF/A, Page 4 of 10
The PDF/A StandardGoals of PDF/AISO Standard 19005 defines a file format based on PDF called PDF/A. The format offers amechanism that represents electronic documents such that the visual appearance remainspreserved for an extended period, independent of tools and systems for producing, savingand reproducing it. This Standard specifies neither the methods nor the intention or thepurpose of preservation. The Standard is thus intended to guarantee that electronic documents can be viewed in their original appearance, even in the future. For this reason, thedocument may not refer, either indirectly or directly, to an external source, for example anexternal image or a font that is not embedded in the document itself.PDF/A is structured as a series that includes multiple Standards. At present however, onlyPDF/A-1 (ISO 19005 Part 1) has entered into effect.PDF/A files are self-describing.All information necessary todisplay the document isembedded.PDF/A-1 is based on PDFReference 1.4. PDF/A-2 will bebased on the ISO Standard forPDF 1.7 (ISO-32000).The distinction between PDF and PDF/AThe PDF format does not guarantee long-term reproducibility or complete independencefrom software and the output device. In order to guarantee both principles, it was necessary to both limit and expand the existing PDF Standard. It was clear from the outset thatPDF/A-1 would have to be based on an existing version of PDF in order for it to beaccepted by the broadest possible swathe of the public. The ISO committee TC 171 chosethe Adobe PDF Reference 1.4 as the basis for the PDF/A-1 Standard.This Reference was implemented by Adobe in their Acrobat 5 product. Since it is a standard, PDF/A-1 must fulfil all requirements of this Reference, and must also respect certaintechnical limitations of Acrobat 5. The original PDF Reference and ISO 19005-1 togethercomprise the current PDF/A-1 Standard. ISO Standard 19005-1 only identifies the differences with respect to the PDF Reference. Accordingly, PDF Reference 1.4 is the central basison which to comprehend the PDF/A-1 Standard.Certain functionalities of PDF 1.4, such as transparency or the integration of audio andvideo, are not permitted by the PDF/A-1 Standard. Certain options outlined in PDF 1.4 aremandatory in PDF/A-1: for example, all fonts used must be embedded in the document.Essentially, the PDF/A-1 Standard does nothing other than specifically identify individualcharacteristics of PDF Reference 1.4 and to indicate whether each is absolutely necessary,recommended, limited, or not permitted.PDF/A, A-1a, A-1b, A-2 „Babylon“The PDF/A-1 Standard is divided into two levels of conformance: PDF/A-1a and PDF/A-1b.There are two levels ofconformance in PDF/A-1.PDF/A-1a (Level A Conformance) defines conformance with all requirements of the PDF/A-1 Standard.PDF/A-1a meets all requirements. PDF/A-1a meets the minimum requirements.The minimum requirements for conformance with PDF/A-1 are contained in PDF/A-1b(Level B Conformance). The PDF/A-1b requirements are generally sufficient for unequivocalreproduction over an extended period. 2009 PDF Tools AG – Premium PDF TechnologyWhite Paper – PDF/A, Page 5 of 10
PDF/A-1a differs from PDF/A-1b mainly with respect to accessibility requirements (Paragraph 508 of the US Rehabilitation Act). PDF/A-1a guarantees that the document text is extractable and that the logical structureof the document as well as the natural reading process of integrated text materialremain intact. Text extraction is mainly of interest if documents are to be displayed onmobile devices (e.g. PDA) or visualized in the sense of Paragraph 508 of the US Rehabilitation Act. This includes the requirement that the representation of the text fit on thereduced screen by being restructured (re-flow). This functionality is also known as“tagged PDF”.PDF/A-1a was drafted to meetthe accessibility requirementsset out in Paragraph 508 of theUS Rehabilitation Act.PDF/A-1b is sufficient foruniform visual display ofdocuments. PDF/A-1b ensures that text and other content on pages is reproduced uniformly; it is nota guarantee, however, that the embedded text is comprehensible and legible. Thecreator of a PDF/A-1b conformant file is at liberty to embed the text in a readable form,even if the more stringent requirements pursuant to the aforementioned Section 508are not met.For scanned documents, conformance with PDF/A-1b is completely sufficient, even if theyhave been processed using OCR to enable a full text search.The Technical Committee is currently working on a new component of the Standard: PDF19005 2 (PDF/A-2). PDF/A-2 is being developed in order to take account of the expandedscope of functionality outlined in PDF Reference 1.7. In the meantime, PDF 1.7 itself hasbeen standardized, i.e. PDF/A-2 will no longer be based on the Adobe PDF Reference, butrather on the new ISO Standard 32000-1 (PDF 1.7).PDF/A-2 is not intended toreplace the existing Standard,but rather to incorporate newPDF functions.Contrary to usual practice with PDF References, PDF/A-2 does not replace the existingStandard PDF/A-1, but rather will exist alongside it for perpetuity. PDF/A-2 conformantviewers must also be able to simultaneously display PDF/A-1 conformant documentscorrectly. 2009 PDF Tools AG – Premium PDF TechnologyWhite Paper – PDF/A, Page 6 of 10
Use of the PDF/A StandardWhere do I get a copy?The PDF/A-1 Standard ISO 19005-1 is distributed directly from the ISO Website (www.iso.org).Both paper copies and electronic versions (as PDF) are available. As is the case for all otherISO Standards, the document is copyright protected. It is therefore illegal to offer free copies via the internet. The PDF/A-1 Standard is only available in English.PDF/A-1 Standard ISO 19005-1 isavailable from the ISO website:www.iso.orgPDF/A is a purely technicalstandard and expert knowledgeis required to implement it.To whom is the Standard addressed?The objective of the PDF/A Standard is to optimize archiving methods. The Standard ispurely technical in nature. For this reason, it is essentially only fully comprehensible to specialists with extensive knowledge about page description languages such as PostScriptand PDF. The main document itself is small, however the scope of the basis document isvery large. PDF Reference 1.4 alone consists of almost one thousand pages – and this doesnot include all information associated with the Reference, such as font and compressionformats, XML specifications, ICC color profiles, digital signatures, RFCs, etc. In addition, theStandard alone cannot guarantee long-term preservation. A strategy for developing company-wide archiving is generally the result of a comprehensive project. Collaboration withexperts who understand the requirements of the PDF/A Standard and can apply them isrecommended. Only in this manner can a consistent strategy be produced that ensureslong-term document preservation goals.What tools are available?Various tools to create, process and verify PDF/A documents have been on the marketsince 2006. Version 8 of Adobe Acrobat includes appropriate tools. Microsoft offers an Addin for Office 2007 that can be downloaded separately. It allows users to produce PDF/Aconformant documents directly, using the Office palette. As innumerable products to create PDF/A documents already exist, the results of the different products with respect tounobjectionable PDF/A conformity must be verified.PDF/A as a component of a comprehensive long-termarchiving conceptIn itself, the PDF/A Standard is merely a component of a comprehensive solution. In isolation, the Standard does not guarantee long–term preservation or reproduction parameters. Moreover, it is not the ideal solution for every project. PDF/A defines the specificrequirements for electronic documents so that they can be archived over the long-term. Tobuild an archive that is conformant to the PDF/A Standard, other aspects must be takeninto consideration. These include, among other things, in-house company standards andprocesses, quality management, reliable data sources and dedicated requirements tailoredto the specific application purpose. In particular, the transfer of existing paper or TIFFarchives to a PDF/A conformant archive requires careful planning.PDF/A is a component of acomprehensive archivingstrategy. 2009 PDF Tools AG – Premium PDF TechnologyWhite Paper – PDF/A, Page 7 of 10The Standard alone does notguarantee long-term preservation, however it is an essentialrequirement to achieve thatobjective.
ConclusionPDF/A – the archiving standardPDF/A is the standard for archiving electronic documents. The PDF format is widespreadglobally. It is used in both the public and private sectors for a wide range of purposes. ThePDF/A Standard is the perfect instrument to ensure long-term preservation and reproducibility of documents over extended periods.The PDF/A Standard also influences the future development of the PDF format itself. Independent of it, Adobe will continue to develop new functionalities. For example, 3-dimensional models or XFA for dynamic PDF forms. Conversely, these developments will influence the PDF/A Standard.What is the market reaction?It is not expected that PDF/A products will inundate the market. The knowledge requiredto understand the PDF/A Standard technology is considerable and specific. In addition,the user expects more sophisticated quality from software than is appropriate for a standard. The first applications have been on the market since 2006. Demand is primarily forPDF/A conformant production of documents , which check PDF/A conformity (validation)and enable simple conversion of existing PDF documents into PDF/A documents.Comprehensive projects to build PDF/A conformant archives have arisen along with thefirst professional PDF/A tools. Currently however, one must not have excessively high functionality expectations. As is so often the case when introducing a new standard, manyproducts will be released to the market that advertise PDF/A conformity yet do not actually fulfill the requirements of the Standard. The use of expert opinions for evaluation purposes is strongly recommended.PDF/A as a long-term strategyThe PDF/A Standard will not be short-lived. Demand has existed for years for a standardized framework for archiving with PDF. The format is already used for precisely this purpose, even if many users must define specific guidelines in order to do so. The fact thatMicrosoft is responding to customer demand by making it possible to create PDF/A documents directly from the most recent Office palette is a clear signal: the internationally validPDF/A Standard for long-term archiving is here to stay. 2009 PDF Tools AG – Premium PDF TechnologyWhite Paper – PDF/A, Page 8 of 10
InformationPDF/A Competence CenterThe PDF/A Competence Center was founded in 2006. The objective of this internationalorganization is to promote the exchange of information and experience with regard tolong-term archiving in conformance with ISO 19005 – PDF/A. The Board is composed ofmanagers drawn from the following companies: callas software GmbH, Compart Systemhaus GmbH, intarsys consulting GmbH, LuraTech Europe GmbH, PDF Tools AG, PDFlibGmbH and Seal Systems. In less than two years, over 85 companies and organizations aswell as numerous specialists from more than 20 countries have joined the organization.www.pdfa.orgPDF Tools AGThe experts at PDF Tools AG have been dealing with PDF technology since 1993. PDF ToolsAG was created as a spin-off in 2002 and today is a global leader in the production of PDFsoftware for clients in all market segments.The products offered by PDF Tools AG are high-quality client and server-based softwareproducts. These have been developed especially for developers, integrators, specialists forclient custom solutions and IT departments. Tens of thousands of companies around theworld employ these products either directly or via a global network of OEM partners. Thetools can easily be integrated into other applications.The CEO of PDF Tools AG, Dr. Hans Bärfuss, is a world-renowned PDF specialist. He is both amember of the ISO committee responsible for the PDF/A Standard and Vice President ofthe PDF/A Competence Center, of which PDF Tools AG was a co-founder.PDF Tools AG has its headquarters in Switzerland, just outside Zurich. The developmentdepartment and European sales team are also based there. The Canada office is responsible for sales in North and South America and in the Pacific region. All products can beacquired directly via the internet. Demo versions can also be downloaded from the PDFTools AG website free of charge.PDF Tools AG, Geerenstrasse 33, CH-8185 Winkel, SwitzerlandTel. 41 43 411 44 50, Fax 41 43 411 44 55, Email [email protected] 2009 PDF Tools AG – Premium PDF TechnologyWhite Paper – PDF/A, Page 9 of 10
Names and trademarks of third parties are legally protected property. Rights may be assertedat any time. The representation of third-party products and services is exclusively for information purposes.PDF Tools AG is not responsible for the performance and support of third-party products andassumes no responsibility for the quality, reliability, functionality or compatibility of theseproducts and devices.DL-82-Whitepaper-PDFA-EN-20100107Copyright 2009 PDF Tools AG. All rights reserved.
Is PDF/A the solution for long-term archiving? White Paper PDF/A – the standard for long-term archiving PDF/A Competence Center. . EMC, PDF Sages, and the National Archives & Records Administration (NARA). These were joined at a later time by others, including X