att.referring

att.referring provides attributes for specifying the beginning and end of a linguistic or textual segment, by addressing the character offsets or by identifying the edge elements via their ID. [1.3.1 Attribute Classes]
Moduletei — The TEI Infrastructure
Membersspan
AttributesAttributes
referringModespecifies whether the span/segment edges are identified as numeric offsets or as pointers (URIs or fragment identifiers). The default value of this attribute is pointer and it is optional in cases where edges are identified by pointing. Otherwise, it has to be present.
Status Optional
Datatype teidata.enumerated
Legal values are:
pointer
edges are identified by pointing at elements (e.g. w, c or seg) that can be pointed at by means of an URI (which most often means that they have xml:id attributes defined). [Default]
id
edges are identified by pointing at existing elements (e.g. w, c or seg) that carry xml:id attributes; this value ensures ID validation -- in this respect, it is analogous to the IDREF data types of DTDs.
character
edges are identified by numeric offsets (non-negative integers); note that a separate project-specific convention regulates the value of the starting index (0 or 1).
icp
"icp" stands for "inter-character point". Edges are identified by numeric offsets (non-negative integers) identifying inter-character points. In e.g. the string "point", inter-character points are marked by vertical lines: |p|o|i|n|t| (by convention, there is an icp before the first character and after the last one); note that a separate project-specific convention regulates the value of the starting index (0 or 1). ISO LAF advocates 0 as the start index.
byte
edges are identified by numeric offsets (non-negative integers) addressing bytes. Note that this low-level indexing type ignores XML or any other structuring.
fromspecifies the starting point of a sequence of characters or bytes, or of elements that can be pointed at with a URI.
Status Optional
Datatype teidata.referring
tospecifies the end-point of a sequence of characters or bytes, or of elements that can be pointed at with a URI.
Status Optional
Datatype teidata.referring
Schematron

<sch:rule context="*[local-name() = ('span')][not(@referringMode) and @from and @to]">
<sch:assert test="@from castable as xsd:anyURI">The default form of @from is a URI</sch:assert>
<sch:assert test="@to castable as xsd:anyURI">The default form of @to is a URI</sch:assert>
</sch:rule>
Schematron

<sch:rule context="*[@referringMode eq 'pointer']">
<sch:assert test="@from castable as xsd:anyURI">When @referringMode is 'pointer',
@from must be a URI</sch:assert>
<sch:assert test="@to castable as xsd:anyURI">When @referringMode is 'pointer', @to
must be a URI</sch:assert>
</sch:rule>
<sch:rule context="*[@referringMode eq 'id']">
<sch:assert test="id(substring(@from,2))">When @referringMode is 'id',
@from must be pointing at an existing local target</sch:assert>
<sch:assert test="id(substring(@to,2))">When @referringMode is 'id',
@to must be pointing at an existing local target</sch:assert>
</sch:rule>
<sch:rule context="*[@referringMode = ('character','icp','byte')]">
<sch:assert test="@from castable as xsd:int">When @referringMode is
'<sch:value-of select="@referringMode"/>', @from must be a non-negative integer</sch:assert>
<sch:assert test="@to castable as xsd:int">When @referringMode is
'<sch:value-of select="@referringMode"/>', @to must be a non-negative integer</sch:assert>
</sch:rule>
Schematron

<sch:rule context="*[local-name() = ('span')][@to or @from]">
<sch:report test="contains(normalize-space(@to),' ') or contains(normalize-space(@from),' ')">The attributes @to and @from on <sch:name/> may each contain only a single
value</sch:report>
</sch:rule>
Schematron

<sch:rule context="*[local-name() = ('span')][@to]">
<sch:report test="@to and not(@from)">If @to is supplied on <sch:name/>, @from must
be supplied as well</sch:report>
</sch:rule>
Example

The example below comes from a part of the CoMParS (Collection of Multi-lingual Parallel Sequences) project and presents a fragment of a monolingual subcorpus of German.

The individual sequences (in this case, a sentence) are listed in the text part of the corpus, while the linguistic analysis is performed in the <standOff> part, which consists, among others, of segmentation information. CoMParS adheres to ISO LAF principles and uses inter-character points with the indexing starting at 0.

<text xml:lang="de">
 <body>
  <ab xml:id="deu-ab1n="1">Ich habe mich im Winter in dir verliebt.</ab>
 </body>
</text>
<!-- 'I'c'h' 'h'a'b'e' 'm'i'c'h' 'i'm' 'W'i'n't'e'r' 'i'n' 'd'i'c'h' 'v'e'r'l'i'e'b't'.' 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 -->
<standOff>
 <listAnnotation n="1corresp="#deu-ab1"
  type="sequence">

  <listAnnotation type="segmentation">
   <seg from="0to="3xml:id="deu-ab1tok1">Ich</seg>
   <seg from="4to="8xml:id="deu-ab1tok2">habe</seg>
   <seg from="9to="13"
    xml:id="deu-ab1tok3">
mich</seg>
   <seg from="14to="16"
    xml:id="deu-ab1tok4">
im</seg>
   <seg from="17to="23"
    xml:id="deu-ab1tok5">
Winter</seg>
   <seg from="24to="26"
    xml:id="deu-ab1tok6">
in</seg>
   <seg from="27to="31"
    xml:id="deu-ab1tok7">
dich</seg>
   <seg from="32to="40"
    xml:id="deu-ab1tok8">
verliebt</seg>
   <seg from="40to="41"
    xml:id="deu-ab1tok9">
.</seg>
  </listAnnotation>
 </listAnnotation>
</standOff>

Segmentation information gathered above is subsequently used by all other (numerous) annotation layers.

The CoMParS ODD contains the following statements that (a) include seg into the att.referring class, and (b) change the default value of the referringMode to icp. The first part will hopefully be accepted by the Council together with this ticket or afterwards, while the second part is forced by the lack of a mechanism that would make attribute classes aware of the elements that belong to them. This mechanism is on the Council's to-do list.
<elementSpec ident="segmodule="linking"
 mode="change">

 <classes mode="change">
  <memberOf key="att.referring"/>
 </classes>
</elementSpec>
<classSpec ident="att.referring"
 mode="changetype="attsmodule="tei">

 <constraintSpec scheme="schematron"
  ident="default_modemode="replace">

  <constraint>
   <sch:rule context="*[local-name() = ('span','seg')][not(@referringMode) and @from and @to]">
    <sch:assert test="@from castable as xsd:nonNegativeInteger">The
         default form of @from is a non-negative integer</sch:assert>
    <sch:assert test="@to castable as xsd:nonNegativeInteger">The
         default form of @to is a non-negative integer</sch:assert>
   </sch:rule>
  </constraint>
 </constraintSpec>
 <attList>
  <attDef ident="referringModeusage="opt"
   mode="change">

   <defaultVal>icp</defaultVal>
  </attDef>
 </attList>
</classSpec>
Note

When referringMode assumes numeric values, two theoretical options are possible for the start index. Some systems assume that indexing starts with 0, some assume that the initial index value is 1. This decision is not reflected by referringMode but should be documented in the header, together with other project-specific encoding decisions. Linguistic analysis in the ISO LAF (Linguistic Annotation Framework, ISO 24612:2012) assumes inter-character points (represented here by the value icp) and indices starting at 0.