Introduction to the TEI Header

Kevin S. Hawkins

Understanding the header (metadata block) of a Text Encoding Initiative (TEI) document as it relates to library cataloging practice. Suggestions for improving this tutorial are greatly appreciated.

Every TEI document (an XML document conforming to the TEI Guidelines) contains a block of metadata commonly called the header, followed by the encoded document itself, often called the body. The header of the TEI document is always encoded using a single XML element, <teiHeader>, which contains child elements for various parts of the description. The body, on the other hand, may be made up of a number of different elements used to create a digital surrogate of a document. The most commonly used of these is <text>, which—confusingly—contains a <body> child element.

But keep in mind that "body" often refers not just to this specific element but to everything that's not the header.

This document will use "metadata" to refer only to data included in the header of a TEI document, as the term is commonly used in the TEI Guidelines. (By contrast, according to some definitions of metadata, all tags added to a document are a form of metadata.)

The TEI Guidelines are primarily concerned with representing source documents in XML—that is, creating a TEI-encoded surrogate of an existing physical or digital document, possibly enhanced with additional interpretation provided by the encoder. For this reason, the Guidelines can describe both metadata about the source document and metadata about the TEI surrogate of it, and in many cases the Guidelines clearly distinguishes which is to be described in a certain part of the header.

The TEI Guidelines do not reference the FRBR model's Group 1 Entities, and while the TEI Guidelines support certain types of links within documents, they do not provide a global linking mechanism that easily supports arbitrary "linked data" applications.

The following table shows the main components of a <teiHeader> in the order in which they must appear in the document (and their position in the XML hierarchy), with a brief description of each.

Element name Purpose Mandatory? Can have more than one?
<fileDesc> bibliographic metadata mandatory non-repeatable
<titleStmt> main title and possibly subtitle, alternate titles, etc., plus information about those responsible for the intellectual content of the TEI document (any of which may in fact be the same as for the source document) mandatory non-repeatable
<editionStmt edition number or other description of the edition of the text presented by the TEI document optional non-repeatable
<extent> size of the TEI document in any unit of interest (bytes, words, paragraphs, etc.) optional non-repeatable
<publicationStmt> information about the publication and distribution of the TEI document mandatory non-repeatable
<seriesStmt> information about the series of which the TEI document is a part optional non-repeatable
<notesStmt> bibliographic notes (information that doesn’t fit elsewhere within <fileDesc>) about the TEI document optional non-repeatable
<sourceDesc> description of the source from which the TEI document was created mandatory repeatable
<encodingDesc> description of encoding practices in the TEI document optional repeatable
<profileDesc> access points and free text descriptions relating to the subject matter of the text, its linguistic characteristics, or both. Note that "text" could refer to the text as found in the TEI document or the source from which it was created, depending on the context. optional repeatable
<xenoData> a wrapper for metadata about the TEI document from non-TEI schemes such as MARCXML or MODS optional repeatable
<revisionDesc> structured record of changes to the TEI document optional non-repeatable

The children of <fileDesc> are based on ISBD "areas" of bibliographic description but with some differences in practice described in section 2.7 of the TEI Guidelines.

See section 4.1.6 of the Best Practices for TEI in Libraries for mappings between TEI header elements and MARC fields for the following situations:

The latter mapping has been implemented in a command-line conversion tool called Thutmose II.