Remediating PDFs in Equidox for Word document export

On this page:


Overview

The Indiana University Assistive Technology and Accessibility Centers (ATAC) composed this article to help guide new users of Equidox in best practices for remediation work. The entire capability of Equidox is too expansive to review in such a short document, so this will simply be an overview of what will require the most remediation attention and the current nuances/workarounds one must remember to use Equidox effectively. For specifics on using the program, see the Equidox training pages. To request access, submit the Equidox Software request form via the IT Accessibility Service Catalog.

Much of the guidance in this document is based on the PDF Association's Tagged PDF Best Practice Guide: Syntax. It is a very technical document, but it gives good parameters and a sense of direction for the work. The goal in this article, however, is to create a Word document as the final product, not another PDF. At IU, a Word document is the preferred remediation output for PDFs because its format is easiest to customize and convert to other formats if necessary.

You may find that some online guides (especially the Equidox guides) suggest that you alter or add to the text of a document to make it more accessible. Although the addition of an editor's note may be appropriate at times, you should not change the text of a document unless you are the author.

For example, do not change factual, grammatical, or spelling errors in the name of accessibility. If errors are present for those examining the PDF visually, they should be present for those examining it with a screen reader. Likewise, do not omit characters from the text version of a PDF if those characters can be added. These include diacritics, punctuation, and special characters. Unfortunately, certain characters, such as end-of-line hyphens, are not characters Equidox allows users to insert, so the solution for now is to leave the hyphens as they appear.

Omissions

Certain content is not considered to be "real" content. This non-real content includes:

  • Redundant header and footer information
  • Separator lines
  • Decorative images including redundant logos

For single instances of such things, delete or resize any zone that would capture them. Otherwise, what Equidox provides for this is called "Ignore", which prevents anything from being tagged within the "Ignore" zone boundary on any page of the document. Wherever this zone is drawn on a page will be redrawn on every other page in the document in the same place.

Zone type

Headings

  • The title of the document should be a heading 1 zone, and it should be ordered to be the first zone of a PDF. A document can only have one heading 1, and it should come first in the order. Since users may expect a heading 1 to be the very first thing in a document, they may overlook or forget any information that precedes the heading 1 when they navigate back to it.
  • The text used for the heading zone 1 should not be manually inserted, but it should be present on the first page. Even if the title for the document comes in the middle of the first page, it is acceptable to make it the first zone of the page. If necessary, a second zone can be placed over the title and marked as normal text so that the title can appear both at the start and in the middle of the page with the other text.
  • If the title reappears on the second page, which is often the case for journal articles, it can be made a heading 2 zone with subsequent zones as heading level 3 and below, or it can be made a plain text zone depending on the need.

Images

  • Give non-decorative images the zone type of "graphic".
  • If a caption is connected to the image that you can zone, make sure to order the caption prior to the image so that it is heard before the image.
  • Any images will require alt-text before they can be considered fully remediated, but ATAC suggests doing this in Word after conversion instead of using the alt-text feature in Equidox.
  • A content expert may be needed to describe images in the document. A short description provided by someone who knows its intended purpose is always better than a lengthy description provided by a non-expert.

Text

Paragraphs

  • Each paragraph should be zoned individually to allow individuals to navigate by paragraph. Unfortunately, Equidox is very poor at determining paragraphs in single space documents, so these usually need to be divided/inserted manually.

Math, superscripts, and subscripts

  • Equidox does not have the ability to differentiate superscripts and subscripts, but whatever characters are there should be included in the text. (For the most part, these will be citations that make perfect sense when announced by a screen reader, even without them being indicated as superscript.) Special attention should be paid to these since OCR is notorious for getting them wrong.
  • Equidox does not have adequate functionality to render math or chemistry. For equations or special math or chemistry characters the OCR cannot pick up, they should be zoned as graphics and replaced with digital math objects once the document is in its final Word doc format. Chemical diagrams can be left as images with adequate descriptions provided.

Language

  • For foreign languages, wait to mark them up in Word, where it will be easier to select the text and change the proofing language.
  • Make sure all characters match the visual text. If this isn't possible, you may need to make manual corrections in Word after the conversion.

Page numbers

  • Page numbers are useful to readers so they should be kept as the first zone of the page (except for the first page, where the heading 1 is always first, and the page number should be second).
  • Equidox does not provide an explicit type for page numbers, so the text option must be used instead.

Footnotes and endnotes

  • Do not mark references to footnotes or endnotes as footnote links since this feature is unreasonably difficult to get right.
  • Be prepared to segment a footnote zone into many parts to include interspersed content, such as quotes, lines of foreign language, and links.
  • Order footnote text zones at the end of the page unless they are overflowing from a previous page.
  • Footnote and endnote text on next pages should not be copied into previous page text. Treat them like paragraphs that span pages. For footnotes that continue onto the last page, order the zone as second on the page after the page number (if present) so that it flows from the next page.

Lists

  • If you find an error in the way a bulleted list is being represented in the final Word version, the bullets may be seen as images or characters in the text area instead of true bullets or artifacts. You can try to correct this in Equidox by using the "OCR" zone source for the subzone inside the lists and editing them as necessary. Double-click inside the list to access a subzone in the list.

Tables

  • In tables, if you lose the option to add or delete rows, use the Equidox zoom settings to zoom out, select a cell, and then zoom back in. This should cause the add and delete options to reappear.
  • If you cannot see the bounding box of the table to the bottom or right on account of an extra row and/or column being present beyond the view, you can select a cell in the last visible column or row and then select the remove column or delete row button to remove it.
  • Select Preview this table before exiting table editing mode to double check that you do not have any extra rows or columns.
  • If you are getting double list indexes (such as "1. 1. Text"), select the table, and then double-click each row of the list to check the "List label" checkbox.

Quote

  • There is no reason to use this option since it does not come through in the HTML or the subsequent Word document.

Blockquote

  • Blockquote cannot be used because it causes whatever page it is on to not be included in the final document. The system will not even alert you during the output process. This is a reported issue.

Zone source

Selecting the zone source is by far the most complicated part about Equidox. To simplify the process:

  1. Use the Custom zone source to check that the underlying text is accurate.
  2. If the text is accurate, switch the zone source to PDF.

    If the text is not accurate, switch the zone source to OCR so you can correct it.

  3. Do not use the Actual text zone for anything.
  4. If you must, use a Custom zone source drawn into a blank space of the document to convey an additional editor's note.

You should avoid using the Custom zone source option as a final solution, and you should never use it to overlap any present text. Although this option artifacts anything the zone is placed over (causing screen readers to read out the typed-in text instead of the artifacted text), the artifacted text will be retained and will remain searchable and included when the text is copied and pasted. The original purpose for the Custom zone source option was solely for adding invisible but screen-reader-announced notes to the document.

Avoid using the Actual text zone source as it is intended solely for making edits to the underlying text that don't affect anything except how a screen reader will read the PDF while it is in PDF format.

Using the PDF custom zone source simply tells Equidox to use the underlying text.

Using the OCR custom zone source is the only way to actually change the underlying text when there is an error (as opposed to just adding unaffecting notes to it or artifacting it).

Here are some tips and tricks for using the zone source options to your advantage:

  • Select Custom source for any text zone (shortcut: c), and select inside the custom text area to cause your browser's spell check to run in the box. This lets you quickly determine whether the zone needs corrections.
    • Leave end-of-line hyphens (soft hyphens) in. Although they should technically be real soft hyphens, this is the current workaround.
    • Apostrophes and quotes should be used to match the text nearly all the time; the correct curly form (“ ”) will be used, which you can see if you zoom in. If this causes issues, do not attempt to use the no-space document checkbox found under the output area as a fix; doing so will eradicate any custom text box work you have done in the document during export.
    • If there are any inaccuracies in the text when compared against the visual text, switch the zone to OCR and edit the text it produces. However, if this is needed too often, the document likely does not meet IU's quality scan requirements, which is a prerequisite for remediation work.
  • OCR zones (and Custom zones) cannot be merged with other zones, even of the same type. Instead you will need to edit each zone separately.
  • OCR text zones (and Custom zones) cannot overlap:
    • When you layer two zones, and both are either OCR or Custom (or a combination of the two) Equidox doesn't have physical character positioning available for the underlying zone.
    • Instead make the two zones separate, even if you have to split one in half, to avoid overlapping.
  • On pages where a lot of OCR work needs to be done, look for an "OCR all zones" option that may be present under the Page tab. Pages that need OCR will probably also have a button under the page tab for reordering the zones. You will still need to check the OCR text that Equidox produces for each zone.
  • When reviewing OCR zones, make sure to check for footnote references, as they are prone to being interpreted as quotes or special symbols because they are small.

Reading order

  • Remember that you can use decimals to insert zones into the reading order instead of having to change every subsequent zone's order.

This is document bhnq in the Knowledge Base.
Last modified on 2024-01-08 14:17:23.