Victorian Women Writers Project

Introduction to the Text Encoding Initiative Guidelines


(Thanks to David Seaman of the University of Virginia for permission to copy from his The Electronic Text Center Introduction to TEI and Guide to Document Preparation.)


The tags and procedures used by the Victorian Women Writers Project are part of the Text Encoding Initiative Guidelines (TEI), an implementation of SGML for humanities texts. The complete TEI Guidelines are available for use over the WWW. The VWWP will follow, as much as possible, a subset of the TEI tags called TEILite, produced by Michael Sperberg-McQueen and Lou Burnard.

The latest TEI Guidelines (known also as "P3") are most concerned with the structural elements of a text: volumes, chapters, sections, acts, scenes, etc. Rather than having separate tags for <poem>, <drama>, <canto>, etc., the TEI uses generic <div> </div> tag pairs for the structural divisions of the text.

All of the texts we prepare share the same basic set of large-scale divisions. Each text is bounded in its entirety first by a pair of tags -- <TEI.2> </TEI.2> -- that mark it as conforming to the Text Encoding Initiative rules. The <TEI.2> tag pair encloses two major sections, a <teiHeader> and a <text>. The <teiHeader> tagset marks off a section that records information about the print source, about the creator of the electronic version, about changes we have made, and so on.

<TEI.2>
<teiHeader>
[Source and processing information goes here]
</teiHeader>
<text>
[All of the material that is part of the text goes here]
</text>
</TEI.2>

Within the <text> boundaries, the work is divided into its major sections. Every text has a <body>, in which the main part of the text is found. Among other things, this arrangement allows one to search for items only in the <body> of the text, filtering out the text in the <teiHeader>.

<TEI.2>
<teiHeader>
[Source and processing information goes here]
</teiHeader>
<text>
<body>
[text goes here]
</body>
</text>
</TEI.2>

In addition to <teiHeader> and <body>, some texts may also have two other main sections: <front> and <back>. The former typically encloses prefatory text such as an introduction or table of contents; the latter typically marks off appendices or indices.

<TEI.2>
<teiHeader>
[Source and processing information goes here]
</teiHeader>
<text>
<front>
[preface, etc goes here]
</front>
<body>
[main body of the text goes here]
</body>
<back>
[appendices, etc goes here]
</back>
</text>
</TEI.2>

Note: in some instances, texts may require tagging as <group> rather than <body>. The <group> tag encompasses a composite text, such as an anthology, grouping together a sequence of distinct texts that are regarded as a unit for some purpose. In this case, the text would be tagged with <group> in order to maintain the sense of a collection of distinct works.

<TEI.2>
<teiHeader>
[Source and processing information goes here]
</teiHeader>
<text>
<group>
<text>
<body>
[first set of texts goes here]
</body>
</text>
<text>
<body>
[second set of texts goes here]
</body>
</text>
</group>
</text>
</TEI.2>

In the body of the text, we number <div>s consecutively, based on their hierarchical relationship to each other within a work. In our usage, the largest structural division is tagged <div0>, followed by <div1>, <div2>, and so on. As an example, a poem in a volume of poetry is more often than not going to be the initial structural division, marked <div0>:

<TEI.2>
<teiHeader>
[Source and processing information goes here]
</teiHeader>
<text>
<body>
<div0>[first poem] </div0>
<div0>[second poem] </div0>
</body>
</text>


Attributes

Tags can be further expanded through the use of "attributes", which are descriptive components within the opening tag. Some attributes are global, and can be used with any tag. The global attributes include:


Empty Tags

Tags usually come in pairs with an open tag and an end tag. In some cases, such as with <l>, the end tag is optional, but nevertheless the tag still either contains text or nests other tags. Some tags, however, are simply markers, and as such do not have end tags, nor are they part of the textual hierarchy. The most common empty tag is page break <pb>. This tag simply marks the spot where a new page begins, with no corresponding end tag. It also contains an attribute for the page number, e.g. <pb n="5">.


About the VWWP
To the VWWP Home Page
To the VWWP Library


Last updated: 14 July 1998
URL: http://www.indiana.edu/~letrs/vwwp/vwwp-tei.html
Comments: letrs@indiana.edu