Formex Version 4

Character encoding


Formex version 4 is based on Unicode 6.0. The encoding follows the specifications of UTF-8. It covers the official languages of the European Union as well as languages used in the European Economic Area and of some candidate Member States. The latters are included just for technical reasons and do not have any political value.

The following chapters give an overview about the different characters which are used in a specific context or for a given language. If for particular purposes other characters have to be integrated in a document the corresponding Unicode character has to be used.

Character entities may only be used under these circumstances:

  1. numeric values corresponding to the Unicode code such as ' ' for 'no-break-space';
  2. special characters having a specific function within an XML instance: '<', '&gt';', '"', '&' and '''.

All other character entities are forbidden. As Formex 4 is based on XML Schema, there is no possibility to define an entity; any character entity other than those mentioned above, will lead to a parsing error.

The creation of text entities is only possible in the context of fragment handling.

Information on the use of special characters can be found in the "Interinstitutional style guide" [].

Character ranges

0020-007E, 00A0-00A3, 00A5-00A7, 00A9, 00AB-00AC, 00AE, 00B0-00B1, 00B5, 00B7, 00BB-00FF, 0100-0107, 010A-0113, 0116-011B, 011E-0123, 0126-0127, 012A-012B, 012E-0133, 0136-0137, 0139-0148, 014A-014D, 0150-015B, 015E-0165, 016A-016B, 016E-0173, 0178-017E, 0192, 01C4-01CC, 0218-021B, 0374, 037E, 0386-038A, 038C, 038E-03A1, 03A3-03CE, 0401-0405, 0407-042A, 042C, 042E-044A, 044C, 044E-044F, 0451-0455, 0457-045F, 0490-0491, 2010, 2013-2014, 2018-2022, 2026, 2030, 2032-2034, 2039-203A, 203E, 20AC, 2116, 2122, 2153-2154, 215B-215E, 2190-2199, 21D0-21D9, 21E6-21E9, 2213, 221A, 221E, 222B, 2243, 2245, 2248, 2260-2267, 2276-2277, 227C-227D, 22D5, 22DE-22DF, 2300, 2326, 2329-232B, 25A0, 25C7, 2605-2606, 2640, 2642, F106-F10A, F1B1, FB01-FB02.

Nota bene: The characters 00B5 and 00B7 are integrated because of legacy reasons, they should not be used for the creation of new instances.

Nota bene: The character 2026 (horizontal ellipsis) has also to be used as a place-holder for a dotted helpline. In this case, it should be used only once without regarding the number of dots effectively needed.

Formex 4 - Physical Specifications - Catalog
Contact: OP A1.002 "Formats, Linguistic Informatics and Metadata"
Version: 5.59 (20170418)