ARCHIVED: KB web services: KB document XML format

This content has been archived, and is no longer maintained by Indiana University. Information here may no longer be accurate, and links may no longer be available or reliable.

When you request a file from the Knowledge Base web services, it is returned in a specialized XML format. You get all the information available about it including not only content, but metadata such as its title and 4-letter ID as well.

On this page:

Basic layout and tags
The body section
The xtra section
The metadata section
Additional resources

Basic layout and tags

The tags listed below comprise the core of the format, in order of appearance. They have no attributes unless otherwise indicated.

document

document is the root tag of this format, and its two children are a kbml tag, and a metadata tag.

kbml

The kbml tag provides the content from the document's raw KBML, in three children. The document's title is listed in the kbq tag. Its content is in the body tag, and xtras are listed in the xtra tag.

The kbml tag has the attribute version, a string that indicates the version of KBML used to write the document.

kbq

The kbq tag contains the title of the document.

body

The body tag provides the document's content. The markup is similar to HTML; the exact details are listed below in the body section.

xtra

The xtra tag lists terms associated with the document. Details are available in the xtra section.

metadata

The metadata tag provides all additional information about the document which is not available from the KBML itself. This tag has a number of children, documented in the metadata section.

The body section

Knowledge Base content is written in KBML, an SGML markup language similar to HTML. That KBML is cleaned up and given to you inside the body tag. You can generally follow a simple path to convert these contents into HTML if you wish.

If you're not converting documents to HTML, or if you just want technical details, you need to understand that many tags in KBML create a conceptual "block", which can only exist wholly inside another block. The content then exists in these blocks. This general structure is borrowed from HTML; for more, see the W3C HTML specification. If you lay out these blocks in a reasonable way and appropriately insert content, any document should have a clean appearance in your application.

Unless otherwise specified, blocks should be as wide as their parent, and as tall as needed for their content to fit. Adjacent blocks should be stacked so the bottom of the first block meets the top of the second.

The body tag may contain any number of block elements.

Block elements in KBML

Unless otherwise specified, all block elements have no attributes and may take any combination of block elements, inline elements, and data as children.

blockquote

The blockquote tag marks a large quote. This is generally rendered by adding margins to the left and right of the text.

boiler

Internally, the KB has a concept of boilers, or reusable content easily included in other documents. Such documents are marked with this tag. The src attribute indicates the internal file providing the boiler content; you do not need to use it.

The boiler tag may be empty or contain a single kbfrag element. This element is a plain block element, with a version attribute indicating what version of KBML the fragment uses.

Neither of these tags should render a new block. You should render the content they contain, but otherwise behave as though the tags are not there. The boiler tag is listed as a block element here because it can occur in places where you expect only a block element (e.g., as a direct child of the body tag).

dl

The dl tag marks a definition list. It may only contain any number of the tags dt, which marks a term being defined, and dd, which provides a definition. Both of these tags are plain block elements.

h3, h4, h5, h6

These tags all represent headers. The KB is designed for six levels of headers; the first two are reserved for your interface (for displaying titles and the like), while documents may use the rest, marked with h3 through h6. These are generally rendered with bold text, and a font size that decreases with lower header levels. Header tags may only contain inline tags and data.

image

The image tag represents an image to be included in the document. Two attributes tell you where to find it: src provides a filename, and format an extension. You can then find the image by following the pattern below for a URL:

  https://media.kb.iu.edu/image/src.format

For example, if src is a012b and format is png, you would retrieve the image from the following location:

  https://media.kb.iu.edu/image/a012b.png

How you should render this image varies significantly based on the inline attribute.

If the inline attribute is empty or does not exist:
You should provide a simple mechanism for viewing the image, but you should not include it directly within the document. Often this is because the image is too large to reasonably fit within the text.

In this case, there may be a description subtag. If so, you can use its text in place of the image. Otherwise, you can use some default text that works best for your application.

The image and description tags may have other attributes, but you can safely ignore them. You should also ignore any other data in the image tag.

Consider this example of a KB image:

<image src="a012b" format="png" alt="Graph between time and size"> <description>See the graph between <em>time and size</em>.</description> </image>

To convert this to HTML, the KB developers suggest you do something like the following:

<a href="https://media.kb.iu.edu/image/a012b.png">See the graph between <em>time and size</em>.</a>
If the inline attribute has content:
You should render the image directly within the document, if possible. The alt attribute provides a text equivalent if you cannot do this, e.g., if you cannot fetch the image.

If the tag has an href attribute, the image itself should be a link to the URL contained therein.

The tag may contain data (which you should ignore because it is obsolete) and a description tag. The description tag contains inline tags and text that should be associated with the image; Knowledge Management staff recommend displaying it, centered, under the image itself. It has one optional attribute, width; if it exists, it provides a suggested maximum width for the description's block on the page, in CSS em units. For more, see the W3C CSS specification.

For example, suppose you encounter the following KBML image:

<image src="a012c" format="svg" alt="Graph between size and time" inline="inline" href="http://www.example.com/"> <description width="5">Graph of this data</description> </image>

If you're converting this into HTML, you could do something like this:

<div style="width: 5em;"><a href="http://www.example.com/"><img src="https://media.kb.iu.edu/image/a012c.svg" alt="Graph between size and time"></a><br />Graph of this data</div>

kbsecure

The kbsecure tag surrounds special content that may not be shown to all users. This filtering is done server-side, so there's no need for you to try to decide for yourself whether or not to display the content; if you received it, you should display it.

The kbsecure tag should not render a new block. You should render the content it contains, but otherwise behave as though the kbsecure tag is not there. It is listed as a block element here because it can occur in places where you only expect a block element, e.g., as a direct child of the body tag.

ol

This tag represents a list where the order of the items should be emphasized, usually by numbering them. The tag may have a type attribute which specifies what numbering system to use when visually rendering the document. Possible values, along with their meanings, are:

1: Digits (1, 2, 3... 10, 11, 12...)
a: Lowercase alphabet (a, b, c... aa, ab, ac...)
A: Uppercase alphabet (A, B, C... AA, AB, AC...)
i: Lowercase Roman numerals (i, ii, iii, iv, v, vi...)
I: Uppercase Roman numerals (I, II, III, IV, V, VI...)

The ol tag may only contain any number of li tags. This tag marks a list item. It is a plain block element; it may contain any other block elements, inline tags, or data.

p

The p tag represents a paragraph, and shouldn't require any special display.

table

The table tag and its associated tags mark up a table, row by row, cell by cell. It has three attributes:

border: If this attribute is 0, there should be no borders drawn between table elements and they should simply be arranged in a tabular fashion. Otherwise, table cells should have visible borders to divide them.
cellpadding: This attribute is a number indicating how much space should exist between the content of a table cell and its border (whether the border is visibly drawn or not). This is identical to the cellpadding attribute in HTML.
summary: This optional attribute is used for accessibility programs (i.e., screen readers). You should ignore it.

The table tag may contain, in any order, a caption tag, and any number of col and tr tags.

caption

This tag may only contain data, providing a simple caption for the table. It has one optional attribute, align, which indicates where the caption should be drawn relative to the rest of the table. Possible values are top, bottom, left, and right.

col

This tag has a single attribute, width. With it, this tag suggests a width for a column of the table, as does the corresponding tag in HTML.

tr

This tag provides a row of table data. Its children are cells of table data. It has one attribute, valign, which indicates where content in the cells should be drawn relative to the entire cell. Possible values are top, center, and bottom.

The table cells are marked with six different tags: c, l, r, ch, lh, and rh. The first letter of the tag name indicates how the cell's content should be aligned within the cell: c for centered, l for left-justified, and r for right-justified. If the first letter is followed by an h, the cell is a header cell, and you should distinguish it accordingly, usually by making text inside the cell bold. All of these tags are plain block elements; they may contain other block elements, inline tags, or data.

All the cell tags have several optional attributes. The bgcolor tag provides the background color for the cell, just as it does for HTML's td tag; colspan and rowspan indicate how many columns and rows the cell occupies, respectively; and valign corresponds to the same attribute of the tr tag.

ul

The ul tag represents an unordered list of items. Normally, when rendered visually, individual items are marked with a bullet point. The ul tag may only contain any number of li tags, marking list items. It is a plain block element; it may contain any other block elements, inline tags, or data.

Inline elements in KBML

Inline tags do not represent blocks. Rather, they give some attribute to their content. For instance, they may signal that a particular sentence is important, or that it represents computer code. Inline tags can be stacked, indicating that all the given properties apply to the contained text.

Except as otherwise specified, inline tags have no attributes, and can have other inline tags or data as children.

address, big, cite, code, em, pre, small, strong, sub, sup, tt

These tags indicate display effects similar to the corresponding tags in HTML. For example, text inside big should be larger than the normal text size. Unlike in HTML, none of these tags have any attributes, and may only have other inline elements as children.

a

This tag represents some sort of anchor. It corresponds to the a tag in HTML, and has the same functionality. If there is a name attribute, it represents an anchor in the document that other resources can reference. If there is an href attribute, the enclosed data should link to the URL provided therein. There may also be a target attribute, which suggests how the link should be opened for the reader.

br

This element represents a forced line break, as in HTML. It has no attributes and contains no data or subtags.

example

The example tag typically provides a command for a user to run and/or its output, as you might see in a shell environment. KB staff suggest you render this text in a monospace font, and do your best to render whitespace contained therein.

hr

The hr element represents a horizontal rule in the page, providing a visual break for the reader. It has no attributes and contains no data or subtags.

kba

kbh

Both these tags represent links to other KB documents, and share a similar structure. The docid attribute specifies which document this link points to. The access attribute indicates whether or not you or your user would be able to view the document, based on the domains you passed into the web services call. Possible values are allowed and restricted.

If access is restricted, you may wish to provide some sort of placeholder text where the link would have been.

The kba tag has two additional, optional attributes, text and qline. Both are for internal use only and should be ignored. It then has three subtags, which contain only data:

title: This tag will only appear if access is allowed. It contains the title of the document being referenced, and should be used as the text of the link.
domain: This tag will occur one or more times; its data names a domain containing the referenced document.
visibility: This tag always occurs exactly once, and its data names the visibility of the referenced document. See the metadata section for more information about document visibility and related metadata.

The kbh tag has no additional attributes. It has subtags identical to kba, except there is no title subtag; rather, link text is provided as data inside the kbh itself.

mi

The mi tag (for "menu item") typically marks up text that appears selectable (i.e., clickable) in a user interface. The KB has traditionally rendered these in a bold, monospace font.

noheat

This tag is for internal use only; it prevents editor tools from automatically turning text into hyperlinks. You should render the content inside it, but otherwise ignore the tag.

The xtra section

An xtra is a word associated with a KB document as a hidden search term that does not appear in the document's content. The KB search engine indexes a document's xtras, allowing that document to be returned in particular searches without cluttering the document's live content. For example, a document about Microsoft Word might have "office" listed as an xtra, so the document will appear in a search for "microsoft office".

The xtra tag (a child of kbml) may contain data and any number of term tags. Data is only a side effect of malformed xtras in the document, and should be ignored. The term tag is defined as follows:

term: A term represents a single xtra associated with the document. The data contained inside the tag is the xtra. The term tag's single required attribute is weight; this is an integer greater than zero that indicates the relative importance of the word to the document. The higher the weight, the higher the document will show up in search results for that word.

The metadata section

Tags inside a document's metadata section provide information such as its author, date written, and other details not available in the main content. All tags, listed below in order of appearance, occur once unless otherwise specified:

docid

The docid is the four-letter document identifier.

owner

author

These tags name the person who has responsibility for the document's accuracy, and the person who wrote it, respectively. Both include an individual username as data, and have two required attributes, firstname and lastname, indicating the person's full name.

The author tag is optional; not all documents will have one.

birthdate

lastmodified

approved

These tags provide important dates about a document. The birthdate tag is the date the document was originally written; lastmodified indicates the date the document was last checked out of version control and modified by KM staff; and approved is the date the document was last reviewed by the appropriate authority.

All of these tags are empty. Their dates are represented in three required attributes: month, day, and year. While month and day are zero-padded two-digit integers, year is a positive integer.

size

This tag contains the size of the original KBML document, in bytes.

importance

This tag contains data indicating the relative importance of a document. This metadata is not well maintained, and KB staff discourage its use.

visibility

Each document's visibility determines how it is displayed in the main IU KB. For instance, documents with a visibility of nosearch can be viewed by the general public, but will not be displayed in search results. The visibility for the current document is stored as data in this tag. Current values are visible, invisible, archived, draft, and nosearch; other values may appear in the future.

volatility

This tag contains data indicating how actively a document is maintained. Two commonly used values are permanent (such documents are frozen and unlikely to change), and stable. Other values may appear, but are rare.

status

This tag contains information about the verification of a document's content. Two common values are approved, indicating that a document has been approved by an appropriate authority, and fine, which indicates that the document has been reviewed by an editor (perhaps the author), but not by an authoritative source. Other values may appear, but are rare.

resource

Some documents are written with assistance from people from other departments (e.g., someone with specialized expertise in a particular type software or hardware). A resource tag may list a single username referring to someone who has special knowledge of the document. The resource tag may also list the content type: concept, task, or ref (for resource). This tag can appear any number of times or not at all.

domain

When a document is associated with a domain, that domain is listed in a domain tag. For more about domains, see ARCHIVED: KB web services glossary. This tag may occur multiple times.

reference

refby

These tags list other documents which are related to this one. The current document may suggest other documents for further reading; those are listed in reference tags. Other documents which suggest reading the current document are listed in refby tags. Each of these tags can appear any number of times, or not at all.

These tags have the same structure, and two required attributes: docid provides the document ID of the external document, and access indicates whether the current user has permission to view the document, based on the domains associated with the user and the document, the document's visibility, and other factors. It may have one of two values: allowed or restricted.

If access is allowed, the tag will have a title subtag, which provides the title of the document as it should be presented to the user.

The tag will then always have one or more domain subtags, each of which lists a domain associated with the document, and a visibility subtag providing the document's visibility. These are similar to the corresponding tags in the current document's metadata section; see above for more information.

kbmeta

Some documents have additional freeform metadata information, usually suitable for inclusion inside an HTML meta tag. If so, that information will be provided in this tag. The tag is empty but has two required attributes: name is a string describing the kind of metadata, and content is the content of the metadata itself. This tag can appear any number of times or not at all.

Additional resources

The Knowledge Management System team provides a number of resources which may help you work with this XML format: