ARCHIVED: KB web services: KB document XML format
When you request a file from the Knowledge Base web services, it is returned in a specialized XML format. You get all the information available about it including not only content, but metadata such as its title and 4-letter ID as well.
On this page:
Basic layout and tags
The tags listed below comprise the core of the format, in order of appearance. They have no attributes unless otherwise indicated.
document
document
is the root tag of this format, and its two children are akbml
tag, and ametadata
tag.kbml
The
kbml
tag provides the content from the document's raw KBML, in three children. The document's title is listed in thekbq
tag. Its content is in thebody
tag, and xtras are listed in thextra
tag.The
kbml
tag has the attributeversion
, a string that indicates the version of KBML used to write the document.kbq
The
kbq
tag contains the title of the document.body
The
body
tag provides the document's content. The markup is similar to HTML; the exact details are listed below in the body section.xtra
The
xtra
tag lists terms associated with the document. Details are available in the xtra section.metadata
The
metadata
tag provides all additional information about the document which is not available from the KBML itself. This tag has a number of children, documented in the metadata section.
The body section
Knowledge Base content is written in KBML, an SGML markup
language similar to HTML. That KBML is cleaned up and given to you
inside the body
tag. You can generally follow a simple
path to convert these contents into HTML if you wish.
If you're not converting documents to HTML, or if you just want technical details, you need to understand that many tags in KBML create a conceptual "block", which can only exist wholly inside another block. The content then exists in these blocks. This general structure is borrowed from HTML; for more, see the W3C HTML specification. If you lay out these blocks in a reasonable way and appropriately insert content, any document should have a clean appearance in your application.
Unless otherwise specified, blocks should be as wide as their parent, and as tall as needed for their content to fit. Adjacent blocks should be stacked so the bottom of the first block meets the top of the second.
The body
tag may contain any number of block
elements.
Block elements in KBML
Unless otherwise specified, all block elements have no attributes and may take any combination of block elements, inline elements, and data as children.
blockquote
The
blockquote
tag marks a large quote. This is generally rendered by adding margins to the left and right of the text.boiler
Internally, the KB has a concept of boilers, or reusable content easily included in other documents. Such documents are marked with this tag. The
src
attribute indicates the internal file providing the boiler content; you do not need to use it.The
boiler
tag may be empty or contain a singlekbfrag
element. This element is a plain block element, with aversion
attribute indicating what version of KBML the fragment uses.Neither of these tags should render a new block. You should render the content they contain, but otherwise behave as though the tags are not there. The
boiler
tag is listed as a block element here because it can occur in places where you expect only a block element (e.g., as a direct child of thebody
tag).dl
The
dl
tag marks a definition list. It may only contain any number of the tagsdt
, which marks a term being defined, anddd
, which provides a definition. Both of these tags are plain block elements.h3
,h4
,h5
,h6
These tags all represent headers. The KB is designed for six levels of headers; the first two are reserved for your interface (for displaying titles and the like), while documents may use the rest, marked with
h3
throughh6
. These are generally rendered with bold text, and a font size that decreases with lower header levels. Header tags may only contain inline tags and data.image
The
image
tag represents an image to be included in the document. Two attributes tell you where to find it:src
provides a filename, andformat
an extension. You can then find the image by following the pattern below for a URL:https://media.kb.iu.edu/image/src.format
For example, if
src
isa012b
andformat
ispng
, you would retrieve the image from the following location:https://media.kb.iu.edu/image/a012b.png
How you should render this image varies significantly based on the
inline
attribute.- If the
inline
attribute is empty or does not exist:You should provide a simple mechanism for viewing the image, but you should not include it directly within the document. Often this is because the image is too large to reasonably fit within the text.
In this case, there may be a
description
subtag. If so, you can use its text in place of the image. Otherwise, you can use some default text that works best for your application.The
image
anddescription
tags may have other attributes, but you can safely ignore them. You should also ignore any other data in theimage
tag.Consider this example of a KB image:
<image src="a012b" format="png" alt="Graph between time and size">
<description>See the graph between <em>time and size</em>.</description>
</image>To convert this to HTML, the KB developers suggest you do something like the following:
<a href="https://media.kb.iu.edu/image/a012b.png">See the graph between <em>time and size</em>.</a>
- If the
inline
attribute has content:You should render the image directly within the document, if possible. The
alt
attribute provides a text equivalent if you cannot do this, e.g., if you cannot fetch the image.If the tag has an
href
attribute, the image itself should be a link to the URL contained therein.The tag may contain data (which you should ignore because it is obsolete) and a
description
tag. Thedescription
tag contains inline tags and text that should be associated with the image; Knowledge Management staff recommend displaying it, centered, under the image itself. It has one optional attribute,width
; if it exists, it provides a suggested maximum width for the description's block on the page, in CSSem
units. For more, see the W3C CSS specification.For example, suppose you encounter the following KBML image:
<image src="a012c" format="svg" alt="Graph between size and time" inline="inline" href="http://www.example.com/">
<description width="5">Graph of this data</description>
</image>If you're converting this into HTML, you could do something like this:
<div style="width: 5em;"><a href="http://www.example.com/"><img src="https://media.kb.iu.edu/image/a012c.svg" alt="Graph between size and time"></a><br />Graph of this data</div>
- If the
kbsecure
The
kbsecure
tag surrounds special content that may not be shown to all users. This filtering is done server-side, so there's no need for you to try to decide for yourself whether or not to display the content; if you received it, you should display it.The
kbsecure
tag should not render a new block. You should render the content it contains, but otherwise behave as though thekbsecure
tag is not there. It is listed as a block element here because it can occur in places where you only expect a block element, e.g., as a direct child of thebody
tag.ol
This tag represents a list where the order of the items should be emphasized, usually by numbering them. The tag may have a
type
attribute which specifies what numbering system to use when visually rendering the document. Possible values, along with their meanings, are:1
- Digits (1, 2, 3... 10, 11, 12...)
a
- Lowercase alphabet (a, b, c... aa, ab, ac...)
A
- Uppercase alphabet (A, B, C... AA, AB, AC...)
i
- Lowercase Roman numerals (i, ii, iii, iv, v, vi...)
I
- Uppercase Roman numerals (I, II, III, IV, V, VI...)
The
ol
tag may only contain any number ofli
tags. This tag marks a list item. It is a plain block element; it may contain any other block elements, inline tags, or data.p
The
p
tag represents a paragraph, and shouldn't require any special display.table
The
table
tag and its associated tags mark up a table, row by row, cell by cell. It has three attributes:border
If this attribute is
0
, there should be no borders drawn between table elements and they should simply be arranged in a tabular fashion. Otherwise, table cells should have visible borders to divide them.cellpadding
This attribute is a number indicating how much space should exist between the content of a table cell and its border (whether the border is visibly drawn or not). This is identical to the
cellpadding
attribute in HTML.summary
This optional attribute is used for accessibility programs (i.e., screen readers). You should ignore it.
The
table
tag may contain, in any order, acaption
tag, and any number ofcol
andtr
tags.caption
This tag may only contain data, providing a simple caption for the table. It has one optional attribute,
align
, which indicates where the caption should be drawn relative to the rest of the table. Possible values aretop
,bottom
,left
, andright
.col
This tag has a single attribute,
width
. With it, this tag suggests a width for a column of the table, as does the corresponding tag in HTML.tr
This tag provides a row of table data. Its children are cells of table data. It has one attribute,
valign
, which indicates where content in the cells should be drawn relative to the entire cell. Possible values aretop
,center
, andbottom
.The table cells are marked with six different tags:
c
,l
,r
,ch
,lh
, andrh
. The first letter of the tag name indicates how the cell's content should be aligned within the cell:c
for centered,l
for left-justified, andr
for right-justified. If the first letter is followed by anh
, the cell is a header cell, and you should distinguish it accordingly, usually by making text inside the cell bold. All of these tags are plain block elements; they may contain other block elements, inline tags, or data.All the cell tags have several optional attributes. The
bgcolor
tag provides the background color for the cell, just as it does for HTML'std
tag;colspan
androwspan
indicate how many columns and rows the cell occupies, respectively; andvalign
corresponds to the same attribute of thetr
tag.
ul
The
ul
tag represents an unordered list of items. Normally, when rendered visually, individual items are marked with a bullet point. Theul
tag may only contain any number ofli
tags, marking list items. It is a plain block element; it may contain any other block elements, inline tags, or data.
Inline elements in KBML
Inline tags do not represent blocks. Rather, they give some attribute to their content. For instance, they may signal that a particular sentence is important, or that it represents computer code. Inline tags can be stacked, indicating that all the given properties apply to the contained text.
Except as otherwise specified, inline tags have no attributes, and can have other inline tags or data as children.
address
,big
,cite
,code
,em
,pre
,small
,strong
,sub
,sup
,tt
These tags indicate display effects similar to the corresponding tags in HTML. For example, text inside
big
should be larger than the normal text size. Unlike in HTML, none of these tags have any attributes, and may only have other inline elements as children.a
This tag represents some sort of anchor. It corresponds to the
a
tag in HTML, and has the same functionality. If there is aname
attribute, it represents an anchor in the document that other resources can reference. If there is anhref
attribute, the enclosed data should link to the URL provided therein. There may also be atarget
attribute, which suggests how the link should be opened for the reader.br
This element represents a forced line break, as in HTML. It has no attributes and contains no data or subtags.
example
The
example
tag typically provides a command for a user to run and/or its output, as you might see in a shell environment. KB staff suggest you render this text in a monospace font, and do your best to render whitespace contained therein.hr
The
hr
element represents a horizontal rule in the page, providing a visual break for the reader. It has no attributes and contains no data or subtags.kba
kbh
Both these tags represent links to other KB documents, and share a similar structure. The
docid
attribute specifies which document this link points to. Theaccess
attribute indicates whether or not you or your user would be able to view the document, based on the domains you passed into the web services call. Possible values areallowed
andrestricted
.If
access
isrestricted
, you may wish to provide some sort of placeholder text where the link would have been.The
kba
tag has two additional, optional attributes,text
andqline
. Both are for internal use only and should be ignored. It then has three subtags, which contain only data:title
This tag will only appear if
access
isallowed
. It contains the title of the document being referenced, and should be used as the text of the link.domain
This tag will occur one or more times; its data names a domain containing the referenced document.
visibility
This tag always occurs exactly once, and its data names the visibility of the referenced document. See the metadata section for more information about document visibility and related metadata.
The
kbh
tag has no additional attributes. It has subtags identical tokba
, except there is notitle
subtag; rather, link text is provided as data inside thekbh
itself.mi
The
mi
tag (for "menu item") typically marks up text that appears selectable (i.e., clickable) in a user interface. The KB has traditionally rendered these in a bold, monospace font.noheat
This tag is for internal use only; it prevents editor tools from automatically turning text into hyperlinks. You should render the content inside it, but otherwise ignore the tag.
The xtra section
An xtra is a word associated with a KB document as a hidden search term that does not appear in the document's content. The KB search engine indexes a document's xtras, allowing that document to be returned in particular searches without cluttering the document's live content. For example, a document about Microsoft Word might have "office" listed as an xtra, so the document will appear in a search for "microsoft office".
The xtra
tag (a child of kbml
) may
contain data and any number of term
tags. Data is only a
side effect of malformed xtras in the document, and should be ignored.
The term
tag is defined as follows:
term
A
term
represents a single xtra associated with the document. The data contained inside the tag is the xtra. Theterm
tag's single required attribute isweight
; this is an integer greater than zero that indicates the relative importance of the word to the document. The higher the weight, the higher the document will show up in search results for that word.
The metadata section
Tags inside a document's metadata section provide information such as its author, date written, and other details not available in the main content. All tags, listed below in order of appearance, occur once unless otherwise specified:
docid
The
docid
is the four-letter document identifier.owner
author
These tags name the person who has responsibility for the document's accuracy, and the person who wrote it, respectively. Both include an individual username as data, and have two required attributes,
firstname
andlastname
, indicating the person's full name.The
author
tag is optional; not all documents will have one.birthdate
lastmodified
approved
These tags provide important dates about a document. The
birthdate
tag is the date the document was originally written;lastmodified
indicates the date the document was last checked out of version control and modified by KM staff; andapproved
is the date the document was last reviewed by the appropriate authority.All of these tags are empty. Their dates are represented in three required attributes:
month
,day
, andyear
. Whilemonth
andday
are zero-padded two-digit integers,year
is a positive integer.size
This tag contains the size of the original KBML document, in bytes.
importance
This tag contains data indicating the relative importance of a document. This metadata is not well maintained, and KB staff discourage its use.
visibility
Each document's visibility determines how it is displayed in the main IU KB. For instance, documents with a visibility of
nosearch
can be viewed by the general public, but will not be displayed in search results. The visibility for the current document is stored as data in this tag. Current values arevisible
,invisible
,archived
,draft
, andnosearch
; other values may appear in the future.volatility
This tag contains data indicating how actively a document is maintained. Two commonly used values are
permanent
(such documents are frozen and unlikely to change), andstable
. Other values may appear, but are rare.status
This tag contains information about the verification of a document's content. Two common values are
approved
, indicating that a document has been approved by an appropriate authority, andfine
, which indicates that the document has been reviewed by an editor (perhaps the author), but not by an authoritative source. Other values may appear, but are rare.resource
Some documents are written with assistance from people from other departments (e.g., someone with specialized expertise in a particular type software or hardware). A
resource
tag may list a single username referring to someone who has special knowledge of the document. The resource tag may also list the content type: concept, task, or ref (for resource). This tag can appear any number of times or not at all.domain
When a document is associated with a domain, that domain is listed in a
domain
tag. For more about domains, see ARCHIVED: KB web services glossary. This tag may occur multiple times.reference
refby
These tags list other documents which are related to this one. The current document may suggest other documents for further reading; those are listed in
reference
tags. Other documents which suggest reading the current document are listed inrefby
tags. Each of these tags can appear any number of times, or not at all.These tags have the same structure, and two required attributes:
docid
provides the document ID of the external document, andaccess
indicates whether the current user has permission to view the document, based on the domains associated with the user and the document, the document's visibility, and other factors. It may have one of two values:allowed
orrestricted
.If
access
isallowed
, the tag will have atitle
subtag, which provides the title of the document as it should be presented to the user.The tag will then always have one or more
domain
subtags, each of which lists a domain associated with the document, and avisibility
subtag providing the document's visibility. These are similar to the corresponding tags in the current document's metadata section; see above for more information.kbmeta
Some documents have additional freeform metadata information, usually suitable for inclusion inside an HTML
meta
tag. If so, that information will be provided in this tag. The tag is empty but has two required attributes:name
is a string describing the kind of metadata, andcontent
is the content of the metadata itself. This tag can appear any number of times or not at all.
Additional resources
The Knowledge Management System team provides a number of resources which may help you work with this XML format:
This is document aqvr in the Knowledge Base.
Last modified on 2023-02-08 14:27:31.