Introduction to the TEI Header

Understanding the header (metadata block) of a Text Encoding Initiative (TEI) document as it relates to library cataloging practice. Suggestions for improving this tutorial are greatly appreciated.

Every TEI document (an XML document conforming to the TEI Guidelines) contains a block of metadata commonly called the header, followed by the encoded document itself, often called the body. The header of the TEI document is always encoded using a single XML element, <teiHeader>, which contains child elements for various parts of the description. The body, on the other hand, may be made up of a number of different elements used to create a digital surrogate of a document. The most commonly used of these is <text>, which—confusingly—contains a <body> child element.

This document will use "metadata" to refer only to data included in the header of a TEI document, as the term is commonly used in the TEI Guidelines. (By contrast, according to some definitions of metadata, all tags added to a document are a form of metadata.)

The TEI Guidelines are primarily concerned with representing source documents in XML—that is, creating a TEI-encoded surrogate of an existing physical or digital document, possibly enhanced with additional interpretation provided by the encoder. For this reason, the Guidelines can describe both metadata about the source document and metadata about the TEI surrogate of it, and in many cases the Guidelines clearly distinguishes which is to be described in a certain part of the header.

The TEI Guidelines do not reference the FRBR model's Group 1 Entities, and while the TEI Guidelines support certain types of links within documents, they do not provide a global linking mechanism that easily supports arbitrary "linked data" applications.

The following table shows the main components of a <teiHeader> in the order in which they must appear in the document (and their position in the XML hierarchy), with a brief description of each.

Element name	Purpose	Mandatory?	Can have more than one?
<fileDesc>	bibliographic metadata	mandatory	non-repeatable
├	<titleStmt>	main title and possibly subtitle, alternate titles, etc., plus information about those responsible for the intellectual content of the TEI document (any of which may in fact be the same as for the source document)	mandatory	non-repeatable
├	<editionStmt	edition number or other description of the edition of the text presented by the TEI document	optional	non-repeatable
├	<extent>	size of the TEI document in any unit of interest (bytes, words, paragraphs, etc.)	optional	non-repeatable
├	<publicationStmt>	information about the publication and distribution of the TEI document	mandatory	non-repeatable
├	<seriesStmt>	information about the series of which the TEI document is a part	optional	non-repeatable
├	<notesStmt>	bibliographic notes (information that doesn’t fit elsewhere within <fileDesc>) about the TEI document	optional	non-repeatable
└	<sourceDesc>	description of the source from which the TEI document was created	mandatory	repeatable
<encodingDesc>	description of encoding practices in the TEI document	optional	repeatable
<profileDesc>	access points and free text descriptions relating to the subject matter of the text, its linguistic characteristics, or both. Note that "text"* could refer to the text as found in the TEI document or the source from which it was created, depending on the context.*	optional	repeatable
<xenoData>	a wrapper for metadata about the TEI document from non-TEI schemes such as MARCXML or MODS	optional	repeatable
<revisionDesc>	structured record of changes to the TEI document	optional	non-repeatable

Element name

Purpose

Mandatory?

Can have more than one?

bibliographic metadata

mandatory

non-repeatable

├

main title and possibly subtitle, alternate titles, etc., plus information about those responsible for the intellectual content of the TEI document (any of which may in fact be the same as for the source document)

mandatory

non-repeatable

├

<editionStmt

edition number or other description of the edition of the text presented by the TEI document

optional

non-repeatable

├

size of the TEI document in any unit of interest (bytes, words, paragraphs, etc.)

optional

non-repeatable

├

information about the publication and distribution of the TEI document

mandatory

non-repeatable

├

information about the series of which the TEI document is a part

optional

non-repeatable

├

bibliographic notes (information that doesn’t fit elsewhere within <fileDesc>) about the TEI document

optional

non-repeatable

└

description of the source from which the TEI document was created

mandatory

repeatable

description of encoding practices in the TEI document

optional

repeatable

access points and free text descriptions relating to the subject matter of the text, its linguistic characteristics, or both. Note that "text" could refer to the text as found in the TEI document or the source from which it was created, depending on the context.

optional

repeatable

a wrapper for metadata about the TEI document from non-TEI schemes such as MARCXML or MODS

optional

repeatable

structured record of changes to the TEI document

optional

non-repeatable

The children of <fileDesc> are based on ISBD "areas" of bibliographic description but with some differences in practice described in section 2.7 of the TEI Guidelines.

See section 4.1.6 of the Best Practices for TEI in Libraries for mappings between TEI header elements and MARC fields for the following situations:

creating a MARC record for the TEI document

creating a TEI header based on the MARC record for the source document

The latter mapping has been implemented in a command-line conversion tool called Thutmose II.