NIMAS Files Best Practices
Note: The NIMAS Files Best Practices document is also available in Word format.
On this page:
- General best practices for NIMAS/DAISY XML files
- Validation vs. Accuracy
- Missing Content
- XML Mark-Up
- NIMAS Filesets Common Errors
- Identifying levels' content type
- Levels and class attributes
- Content components/page elements
- Page numbers
- Multiple-page elements
- Tables of contents
- Write-on lines or fill-in-the-blank lines
- Text in images
- Color profiles in images
- Selected list of media able to view CMYK profile images
- Converting images from one color profile to another
- Additional information regarding color profiles and image formats
- Corrections to filesets
- Resources for more information
1. General best practices for NIMAS/DAISY XML files
Implementation of the NIMAS and a corresponding increase in NIMAS filesets and their submission to the NIMAC have brought questions to light regarding practical implementation of the Standard. Several items of interest have been discussed as implementation has progressed, and, best practices—determined by consensus of members of the NIMAS Technical Assistance and Development centers—are outlined below. Producers of NIMAS filesets are urged to consult the Creating NIMAS Files document, available at the NIMAS web site, as well. The NIMAS site’s Exemplars page also contains more information and examples.
- All images must have a placeholder for alt text or, if possible, alt text itself. Ideally, complex images and instructionally important images should also have long descriptions (LDs).
- If a word breaks between pages, leaving a hyphenated word at the bottom of one page and the top of the next, remove the hyphen and include the whole word at the end of the first of the two pages.
- All images that are part of an original print source are required as part of a NIMAS fileset. One approach to checking that all images are present in completed filesets is to use the NIMAS Conversion Tool.
- Use the correct character encoding (i.e., UTF-8). For more information about why the use of UTF-8 is important, see the Character Encoding section of Creating NIMAS Files.
The NIMAS Center strongly urges the use of the most current DTD in NIMAS filesets, while recognizing that the current NIMAS Technical Specification permits use of DAISY/NISO Z39.86 2005 (1) and later (-2 and -3) as the XML source file DTD reference. The components of a fileset should be evaluated according to the NIMAS Technical Specification (currently v1.1), a sub-set of the DAISY ANSI/NISO Z39.86 standard. As an extensible specification, the NIMAS Technical Specification was written to provide some flexibility as well as to state that updates and additions are and would be ongoing. Please see Creating NIMAS Files for additional information.
2. Validation vs. Accuracy
It is possible to have NIMAS filesets that validate to the correct specifications yet fail to be complete and accurate files. Files may validate against one spec, yet fail another, or may satisfy requirements and yet still contain errors, missing content, or other problems. Here we list examples of such in order to help producers of NIMAS files to prepare filesets that are both valid and complete. Questions and additional examples are welcomed and may be sent via email to aim [at] cast [dot] org.
Missing content is a common error in failed NIMAS files, an avoidable mistake which should not occur. It is also one of the easiest errors to correct. Content often missing includes images and text components other than in-line sections. Compare source to files and ensure all content is included before submission to the NIMAC.
- Pages missing or out of order
- check pagination before fileset submission; do not use a national edition source file and simply copy and paste state edition pages into it—carefully check pages and page numbers
- check tables of contents against content itself
- spot-check text components such as sidebars, study question sections, quotes at the start of chapters, inset boxes, footnotes, block quotes, curriculum references, etc.
- Images missing or marked up incorrectly
- make sure all images appearing in a print work are included as part of that work’s NIMAS fileset (required by the Standard) and are listed correctly in the OPF file
- check that all actual image files have corresponding code in the correct location within the work’s XML file and that this code references images correctly, and that all images are saved in a consistent format, with no stray images outside of their folder or directory
Metadata errors commonly include missing required metadata items and incorrect coding of components in a fileset’s OPF. The NIMAC provides detailed instructions on required metadata for NIMAC validation, and are also available via email to respond to questions. Producers of NIMAS filesets should also run their XML and OPF files through the NIMAC Validator to check for problems prior to submission. Please see the NIMAC and metadata section of Creating NIMAS Files for important information regarding UPC codes used as unique identification.
- Make sure the correct specification is listed in the OPF file’s declaration (as of this writing, it is <!DOCTYPE package PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.2 Package//EN" "http://openebook.org/dtds/oeb-1.2/oebpkg12.dtd">)
- Make sure the OPF file contains a <spine> element that includes an <itemref idref="xxx"/>
- Check the OPF file against NIMAC requirements and run it through their Validator
- Spot-check filenames for exact matches with fileset component files’ names.
It is important that the mark-up used in XML files is appropriate and suitable for the content of the print work. An important aspect is to use a correct and accurate level structure that reflects the source. Use elements correctly and employ full DAISY mark-up wherever appropriate and feasible. Be sure to use the <level1> … <level6> elements properly: nest levels correctly and do not use levels for block or in-line components. Key basics and best practices for NIMAS XML are available in Creating NIMAS Files. Helpful examples can be found on the NIMAS site’s Exemplars page. Review the DAISY Structure Guidelines' information regarding the proper use of levels.
- Elements must match content (example: use <sidebar> for appropriate margin content and boxed text, etc., rather than <div>; use <note> and <noteref> for footnotes rather than <span>; etc.)
- Nest elements correctly (example: <level2> always within <level1>; no unpaired tags)
- Select class and id attributes in a consistent manner
- Spot-check in-line mark-up to ensure that it codes the correct portion of content (example: <author> includes name only; <line> includes only one line; captions are not marked as paragraphs, etc.)
- Do not combine text elements erroneously (example: end-of-lesson assessments marked up as end of correct lesson; tables of contents entries include separate mark-up for page numbers; etc.)
NIMAS Filesets Common Errors
As use of the NIMAC increases, AMPs are finding several errors in filesets that seem to occur more often than necessary. Below is a list of commonly found errors for publishers and other preparers of NIMAS filesets to watch out for:
- ensure tables are coded as tables within the XML source file and are not presented as images; rarely, images are made up of content presented visually as tables but by far the vast majority of tables are and should be coded using <table> and its associated elements
- ensure that lists have been formatted correctly by checking the <list> element’s style attribute and the <li> elements’ content (for example, adding numbers to a list that uses the “ol” attribute would result in duplicate numbering
- Inappropriate mark-up
- Using mark-up for styling and layout is tempting but is a major error and defeats the best purpose of an XML file, which is to separate content from its presentation. Do not use any style, format, or layout coding within an XML file, including style tags within elements (this is done in HTML but is not permitted in NIMAS source files), inappropriate use of the <br/> tag, inappropriate use of the entity, etc.
3. Identifying levels’ content type
The <level> element (heavily used in NIMAS XML source files), does not contain discrete information about the type of content it contains. For example, an XML source file may contain units and chapters, another may contain chapters only, or a third units, chapters, and sections. However, it is very useful for producers, transcribers, and end users to be able to know where they are (conceptual level) when encountering segments (structural level) in a work. This information can be provided in two ways:
1. include class attributes and id’s with <level> elements:
|XML:||Output (through XSLT/CSS/etc.):|
|<level class=“chapter” id=“02.02”>Weather</level>||Chapter 2: Weather|
|<level class=“chapter” id=“02.03”>Climate</level>||Chapter 3: Climate|
2. include content type within <level> content:
|<level2>Chapter 2: Weather</level2>||Chapter 2: Weather|
|<level2>Chapter 3: Climate</level2>||Chapter 3: Climate|
Note: Bear in mind that the above examples are meant to illustrate a specific idea. They do not contain full mark-up, such as a complex id structure, because that would not serve the purpose of this example.
4. Levels and class attributes
It will often be necessary to use class attributes with level elements (<level1>, etc.) It is important to note that the more specialized class attributes are, the less useful an XML source file is. It is a mistake to use class attributes that are specific to a particular work. As an aside, do not use spaces in attributes. Please compare the following:
<level4 class="subsection"> vs. <level4 class="Biographies">
While it would be possible, of course, to use a class attribute such as “Biographies” it would count as an error because it would at least in part defeat the purpose of the XML, which is to create a source file useful in a variety of ways.
Using a consistent attribute list is helpful and efficient for anyone creating more than one XML work. The attribute “subsection” would be useful across many books, for example, where the attribute “Biographies” may only be useful in one book. XSL using “subsection” could be re-used, but work done to transform or render XML with one-off attributes would not be reusable.
If desired, one way to obtain description information embedded in a file is to use part of the id attribute for this purpose instead. Id attributes are much more suitable for this purpose for the obvious reasons. However, the ids will almost certainly be used for other things besides finding out what kind of content is in the item, so the goal is to include without excluding. See the following example, where “Bio” is a descriptive indicator and “sub1245” is the rest of the id attribute: <level4 class="subsection" id="sub1245Bio">. Note that id attributes should never be used for styling or format purposes.
5. Content components/page elements
Content components or page elements that appear throughout a work should be marked up consistently so that placement is uniform. For example, if footnotes, captions, and the like are not marked up as in-line elements, they should be marked up to appear in the same location each time they occur; such as at the ends of pages.
Content components should not be changed from one format to another during the mark-up process. For example, in-line text components should not be changed into tables, images, or sidebars, etc.; sidebar content should not be changed into in-line text, images, etc; rearmatter should not be placed in frontmatter or bodymatter, etc.
NIMAC and metadata
Please see the NIMAC and metadata section of Creating NIMAS Files for important information regarding UPC codes used as unique identification.
6. Page numbers
Page numbers should be marked up using the <pagenum> element at the beginning of each page, and all page numbers should be included, whether or not they appear in print. Pages without numbers should be marked up with page numbers, or a sequence substitute, even if these will not be shown or rendered. Placing the <pagenum> element at the beginning of each page is the most logical and efficient choice in general, it is of especial benefit for Braille transcription, and it does not hinder rendering the actual numbers elsewhere on page or screen. Examples:
Unnumbered blackline master page: <pagenum page=“special” class=“unshown”>II.04</pagenum>
Unnumbered flashcard: <pagenum page=“special” class=“unshown”>112458-01</pagenum>
Occasionally source material will not have page numbers of any kind. Examples of instructional materials that may be created without printed page numbers include blackline masters, transparencies, worksheets. Yet the use of <pagenum> is necessary to distinguish one physical page from another. In such a case it is acceptable to use another available sequential number in place of the absent page number. Examples of such numbers include a unique product code; unique lesson, unit, and item (or similar) information present in a printed header or footer. These items permit the indication of sequence in a way comparable to page number. Examples:
<pagenum>1136-8</pagenum>, <pagenum>1136-9</pagenum>, <pagenum>1136-10</pagenum>
<pagenum>4581-3A</pagenum>, <pagenum>4581-3B</pagenum>, <pagenum>4581-3C</pagenum>
Questions have been posed to the NIMAS Technical Assistance Center regarding <sidebar> content that is sectioned and, in some cases, subtitled. It is inappropriate to use <div>s for this purpose. One way to code content structured in such a way is illustrated as follows:
<hd>El Grande Lado</hd>
<hd class “sidebarsub01”>Lado Pequeno</hd>
<p class “side02.01”>texto</p>
<p class “side02.02”>texto</p>
<hd class “sidebarsub02”>Lado Minisculo</hd>
<p class “side03.01”>texto</p>
<p class “side03.02”>texto</p>
The <sidebar> element is relatively versatile in that it may contain, for example, numerous <hd>s; <blockquote>, <list>, or even another <sidebar>. However, one potential drawback is that if subheaders are not explicitly tied to their accompanying texts, there will be no way to style sections differently from each other through the use of CSS. The <sidebar> element should not be used where more appropriate elements exist. Do not use <sidebar> for lists or notes that appear as in-line or block elements, for example.
8. Multiple-page elements
When a table spans more than one page of a printed work, it is recommended to put all page numbers in (separate) <pagenum>s prior to the start of the <table>; so a table that spanned three pages would have three <pagenum>s and then an opening <table> tag. Producers are also advised to add a <prodnote> that explains the page span, and a good location for this would be below <pagenum>(s) and above <table>. Example:
<pagenum id="p008" page="normal">8</pagenum>
<pagenum id="p009" page="normal">9</pagenum>
<pagenum id="p010" page="normal">10</pagenum>
<prodnote id="002.001.T04" render="required">Please note that the following table spans three pages of the print source work.</prodnote>
When a timeline or other image or text-and-image combination spans more than one page of a printed work, it is acceptable to produce an image file reduced to a size that permits it to be rendered on one screen or page and to create a long description (LD) containing all of the text in that component. Example:
Alt tag: Timeline of important events in Africa and the Americas
Long description (LD): This timeline of important events in Africa and the Americas is made up of a red line with blue markers indicating specific events at intervals. From left to right, the events are listed as follows:
750 B.C.: Kushites conquer Egypt
500 B.C.: Mayan civilization begins
200 A.D.: Ghana founded
700 A.D.: Shona settle in Zimbabwe
1240 A.D.: Kingdom of Mali established
1400 A.D.: Aztec Empire prospers
Note that multiple-page items may remain so, and, in some cases, ought to; however, it is more important to include all of the content of multiple-page components as is required than to preserve them as more than one page.
9. Tables of contents
A table of contents (TOC) should be created as part of the NIMAS 1.1 source file for each print work that contains a TOC. The <list> element should be used to create this. Please note that conversion tools are able to generate TOCs only if the underlying source XML is correctly added to the source content.
An index or a TOC is the kind of content where <lic> (list item component) would sometimes be used; but, since <lic> is not universally recognized, it is recommended that class attributes be used with <li> (list item) instead as shown in the following example: <li class="entry">, <li class="page">
10. Write-on lines (WOLs) or fill-in-the-blank lines
The NIMAS Technical Assistance Center has received several inquiries regarding the proper way to indicate the presence of write-on lines (WOLs) or fill-in-the-blank lines in NIMAS source files. Currently there is not a specific element for this in the NIMAS 1.1 element set nor in DAISY2005. That being the case, it is acceptable to use a continuous series of underscore characters for this purpose. To match the level of granularity at which WOLs occur, it is recommended to code the characters as shown:
<span class="WOL">__________</span> or <span class="blankline">__________</span>
11. Text in images
Instructional content should be provided in an alternate format to images. Text in images calls for thoughtful analysis of the image(s) in question because there is more than one kind of this type of image, each calling for different handling. A person who knows the content of the textbook is the best person to decide which kind of image with text is being handled and how to mark it up for accessibility.
The key to choosing whether or not to mark up text in images as alt and LD text or as body text is to determine whether or not the text is an integral part of the image (chart; map = alt and LD), whether the content is presented visually for variety or other non-instructional reason (some icons; repeated reminders = alt), and whether or not the text would stand alone if its image were not present (menu; scorecard = body text).
This last point of whether or not embedded image text could stand alone without its visual is an important one since image-dependent embedded text must remain part of its image (as a long description) in order for the text to retain its meaning and stand-alone embedded text is often far more accessible if converted into body text because image-independent embedded text is, effectively, body text presented visually. The last two examples of those below show one of each case.
These and similar factors should determine specific mark-up of image text. A few examples of images with embedded text and appropriate ways to mark them up is shown below.
|image type||best practice||description||example alt tag text||example LD or body text|
|Maps||Provide an alt tag and a long description (LD)||Image of the state of Arizona with the capital indicated||Map of Arizona||This map of Arizona shows the state’s outline, with a long, straight line as its eastern border; a square southeastern corner; a diagonal line going up in a northwestern direction showing its southern border; a curved, irregular line showing its western border; and a straight line showing its northern border. The capital city of Phoenix, in the central southern half, is marked with a red star.|
|Icons||Icons with non-instructional text should be given an alt tag that provides the text||Image of a checkmark with text “Check Your Work!” embedded in it|
alt=“Check Your Work! icon”
alt=“Checkmark icon labeled ‘Check Your Work!’”
|Icons with instructional text should be given an alt tag and instructional text should be pulled out into body content as a sidebar, paragraph, or other appropriate content component||Image of a computer with text “Use resources 12 and 14 from CD 1 for this assignment” embedded in it||alt=“Computer icon”||<sidebar render=“required”>|
<p id=“001.02.ictxt”>Use resources 12 and 14 from CD 1 for this assignment.”</p>
|Photos, drawings, paintings, illustrations, etc.||Photos or illustrations with instructional text should be given an alt tag and an LD, and instructional text should be pulled out into body content as a sidebar, paragraph, or other appropriate content component||Image of a tomato plant with text labels||alt=“Drawing of a tomato plant with labeled parts”||This drawing of a tomato plant shows a tall green stem with many six-pointed leaves and three round, red tomatoes. The roots of the plant are drawn in brown beneath the sandy soil. The parts of the tomato plant are labeled, with “roots” next to the plant’s roots, “stem” and “leaves” next to the green stem and leaves, and “fruit” next to the round tomatoes.|
|Image of a cartoon dog reading a poster with instructions||alt=“Illustration of a dog reading an instructions poster”||<sidebar render=“required”>|
<hd id=“001.04.01.imgtxt”> Make Your Own Smoothie!</hd>
<p id=“001.04.01.imgtxt”>Make your own healthy and delicious smoothie for lunch or a snack.</p>
<p id=“001.04.02.imgtxt”>Put the following ingredients into a blender:</p>
<list id=“001.04.01.imgtxt” type=“ol”>
<li>1 cup strawberries</li>
<li>½ cup vanilla yoghurt</li>
<li>2 tblsp protein powder</li>
<li>1 cup milk</li>
<p id=“001.04.03.imgtxt”>Put the lid of the blender on and blend ingredients together at a medium speed for 30 to 40 seconds. Pour into a large glass and serve with a straw.</p>
12. Color profiles in images
The NIMAS Technical Assistance Center has received inquiries regarding color profiles of images included in NIMAS filesets and questions about their “viewability” in various tools and applications. Confusion seems to arise between CMYK and RGB color profiles. CMYK is primarily a color profile used for print while RGB profiles are the more common ones used in online and computer environments. The NIMAS specifies that images must be in one of the following formats: SVG, PNG, JPG. SVG and PNG formats use RGB profile and JPG usually uses and can always be set to use RGB profile. Therefore, an image with a CMYK color profile ought not to pose difficulty for a NIMAS fileset. For example, if a Photoshop image file with a CMYK profile was converted to a PNG-format image for a NIMAS fileset submission, it would simply become RGB based since that is the color profile PNG uses. It is also possible, as in the case of SVG-format images, for both CMYK and RGB information to be preserved in an image file (desired profile could simply be selected). There are many options for converting images from one profile to another if necessary.
Selected list of media able to view CMYK profile images
Concern has arisen regarding the ability to view CMYK-based images in a computer environment. The following list should allay any fears that these images are not easily viewed.
- Corel Draw
- Microsoft Office Picture Manager
- Microsoft Publisher
- Quark Express
Converting images from one color profile to another
The following sequences should provide a general idea of how easy it is to change an image’s profile if necessary.
Select the Image pull-down menu
Select color profile
Select the Image pull-down menu
Select color profile
Select the File pull-down menu
Select Document Color Mode
Select color profile
Select everything in the document and go to Filter, Color, Convert to RGB
Open the Color palette menu (use the little black arrow on the top right corner) and select RGB*
*From TEOM’s The End of Magic web site (http://teom.coningham.net/ai-cs2/convert-a-document-from-cmyk-to-rgb).
Additional information regarding color profiles and image formats
RGB vs. CMYK
This page contains an overview of the difference between RGB and CMYK.
Microsoft Standard CMYK Profile
This page provides a download of Microsoft’s CMYK profile and briefly describes its use in Microsoft products.
Windows Color System
This page explains the use of color profiles in Microsoft Windows and includes related links.
GIMPshop is “a modification of the free/open source GNU Image Manipulation Program (GIMP), intended to replicate the feel of Adobe Photoshop” and “is available for many different operating systems, including Mac OSX, Windows 98+, Linux, and Solaris” (from the web site).
Note: Color management kits for determining the differences between RGB and CMYK colors when used for print can be purchased from a variety of sources. As of this writing, prices for comparison kits range from $85.00 to $300.00. Color indexes may be purchased in book form for $15.00 to $400.00.
The <list> element is commonly used for a variety of content typically appearing in list format: tables of contents, bulleted lists, numbered lists, lettered lists, Q&A lists, indexes, etc. It is important to differentiate between list types. The NIMAS specification permits the use of ordered lists (typically a list numbered 1, 2, 3…), unordered lists (typically a bulleted list), and pre-formatted lists (no default styling applied). Beyond this, additional differentiation is often necessary, as a large work may contain several of each list type. Class attributes provide a way to create differentiated yet repeating list types and should be used to mark up lists appearing in works with many list formats. The correct use of nesting of lists is also a helpful mark-up tool.
14. Corrections to Filesets
If corrections are made to a published work after a NIMAS fileset for that work has been submitted to the NIMAC, those same corrections should be made to the NIMAS files and a new fileset submitted to replace the original. If changes to a work are minimal or cosmetic in nature, it is not necessary to resubmit a replacement fileset to the NIMAC.
15. Resources for more information
- Go to CAST’s NIMAS Exemplars page to see examples of NIMAS-conformant files that include appropriate package files within a NIMAS-conformant fileset.
- See also the NIMAS Technical Specification page, and the NIMAS Conversion Tool which converts NIMAS filesets into leveled XML or HTML outputs.
- The DAISY Structure Guidelines provides information on creating XML files valid to DAISY2005.
- Creating NIMAS Files contains practical information and details about cresting NIMAS XML files.
- The NIMAC web site is the portal for the national repository of NIMAS filesets.