Formex Version 4

Preliminary remarks

Standardised markup languages such as SGML are on the market since nearly 20 years. The importance is underlined by issues like HTML which became the essential tool for internet publications.

The Office for Official Publications started the production of SGML instances for the documents published in the different OJ series in 1985. The specifications are known as Formex (Formalised exchange of electronic documents). The current version 3 entered into force on the 1st of April 1999.

For different reasons, SGML and SGML based grammars have not had the success as expected. The most important one states that the development of suitable and comfortable tools is very hard. This is why XML was invented which is much stricter and therefore made it easier for the development of tools. Most of the basic tools were indeed available when the standard was adopted.

XML is only the starting point for the development of depending standards which allow to transform and, in particular, to present the instances (XSLT, XSLFO). Another important effort in this context was made by developing a new standard for the specification of grammars. DTDs are replaced by XML Schemas which also offers the possibility to define the contents of elements and attributes.

For those reasons, the Office for official publications decided to migrate the Formex specifications from SGML to XML. The version 4 is presented as an XML Schema grammar. At the same moment, the character set definition abandons the approach based on ISO 2022 and moves to Unicode (UTF-8). So Formex version 4 could be expressed by this form:

Formex 4 = XML + XML Schema + Unicode

The migration to XML also gave us the opportunity of reviewing the existing specifications. Together with users, difficulties could be detected and removed. So most of the models could be stream-lined and thus became much easier. Instead of about 1200 tags in Formex version 3, Formex version 4 consists of only about 260 tags.

The specifications of Formex version 3 are available in French, English and German. As they are exclusively used in an informatics' context and as English is more or less the lingua franca in this domain, the specifications of Formex version 4 will only be available in English.

The specifications consist of two parts:

  1. the physical specifications which contain information on the exchange of data, the construction of filenames and, in particular, on the character set; and
  2. the grammar for the markup based on XML Schema.

A lot of illustrations and examples are added.

A helpdesk will be maintained for the introduction and use of the new specifications. Anyway, the cell "Formats" will always be available to answer questions or give advice.

Luxembourg, May 2004

Formex 4 - Physical Specifications - Catalog
Contact: OP A1.002 "Formats, Linguistic Informatics and Metadata"
Version: 5.58 (20161101)