The Science of Digitization

Line By Line Revivals

Case Study

Problem

Faber and Faber, a significant UK publisher wanted to reproduce out-of-print books with new editions that were line-by-line accurate to the original books, but used modern fonts and paper sizes.

We call these Line-by-Line or LbL PDFs. These books would be available through Print on Demand supply. The same books would be made available as e-Books as well.

The design copyright was with independent third parties so scanned Print on Demand books were not an option.

The source material was the original books with a wide range of ages and quality. Many of the books were sourced from private libraries and could not be damaged in the production process.

The problem was how to go from print to print facsimile and e-books quickly and cost effectively. The publisher had tried several suppliers but were always told while near page matching could be done, line-by-line resetting was probably impossible and very expensive.

They needed 120 books produced per month for three months, winding back to around 30-40 books per month ongoing. As usual the release deadline was approaching.

Solution

Estel Labs takes a relatively high-road in production technology. In the 20 years we have been digitizing content and using digital content production tools like IGP:Digital Publisher we have learnt it is better to keep all data possible and process it out if it is not required for a particular format.

Our in-house OCR and proofing tools always preserves line and page metrics. That means when content is produced via OCR/Proofing we know where every line and page ends. This XML data can be used as part of the output process.

One of the more interesting problems that must be addressed is end-of-line (EOL) hyphens, so the tool has a relatively complex algorithm to handle the marking of EOL hyphens for inclusion or exclusion in flowing e-text while preserving the hyphen for the LbL print PDF.

The same problem occurs at the end of a page which may have a hyphen and may or may not be continued on the next page.

IGP:Digital Publisher was ready for the challenge. We had to make sure the staff was as well.

Implementation

We started with a sample set of test books with different ranges of quality and complexity. The books were reasonably simple but some contained illustrations, photographic plate pages and references and indexes to make the process a little more challenging. Some had footnotes that needed to stay in place in the printed book and move in the ePub editions.

The powerful block structuring tagging patterns available in IGP:FoundationXHTML are complete enough to allow any structure to be targeted for specific presentation styling for print; independent of e-book requirements.

The proofing tool outputs the two-pass proofed text with all the required end-of-line and hyphen tags. A little CSS adjustment and the pages present perfectly. After QC editor inspection and fine-tuning output formats can be instantly generated. Line and page breaks are preserved for the PDF production and stripped for the e-Books by IGP:Formats on Demand at format generation time.

To scale for the requirements the development team had to create procedures and production editors had to be trained in the proofing, tagging production and quality inspection methods required.

With digital content production we take the approach there are no heros or geniuses. There are the trained and the untrained. Quality doesn't come from what you do, it comes from how you do it. Our formal approach to training made sure there were enough production editors skilled in the LbL processes and procedures to hit the production targets and quality requirements.

Results

The marketing announcements had been made and production hit the targets every month. We delivered the PDFs straight to the Print on Demand vendor using IGP:Distribution Manager.

Hundreds of Line-by-Line PDFs have been created and delivered with their companion e-book formats. The original content was maintained in our master IGP:Digital Publisher Content system.

18 months after the production of the first books the publisher purchased IGP:Digital Publisher and the books were all transferred to their system. They now have the ability to keep generating new formats from content that was tagged in IGP:FoundationXHTML over eleven years ago.

We continue to produce LbL PDFs and transfer the content directly to the publisher's system on completion.