Name

exemplar — An XML parser generator

Synopsis

exemplar [ --debug ] [ --debug-level debug_level ] [ --exclude {CDSect,CharRef,Comment,PI,doctypedecl} ] [ --help ] [ --include {entities} ] --input input [ --input-type {dtd} ] [ --output output ] [ --output-api {binding,sax1,sax2} ] --output-language {dtd,java,xslt} [ --output-package output_package ] [ --verbose ] [ --version ] [ --vocabulary vocabulary ]

Description

Exemplar is a tool for processing XML vocabulary descriptions (currently just DTDs) into useful code.

Exemplar was originally designed to produce the smallest possible XML parser for a given vocabulary. The idea was that by restricting the vocabulary of XML one could "cut corners" within the parser. This was found to be only partially true. To generate smaller parsers, one only needs to know what features of XML the vocabulary makes use of. For example, whether or not a vocabulary defines attributes for some elements is much more important than knowing the exact content models of the elements in the vocabulary. Exemplar can create SAX 1.0 parsers that are approximately 5-10KB.

Exemplar fully parses vocabulary descriptions and creates a model of the vocabulary. The current goals are to use the model of the vocabulary to perform tasks other than creating small parsers, such as creating faster parsers or auto-generating the code needed to make full use of a parsed XML document.

Options

The following options may be passed to Exemplar on the command line.

--debug

Turn on all internal assertion checking and logging.

--debug-level

Turn up the level of debugging, higher numbers produce more copious output.

The default value for this option is "1"

--exclude

Prevent the generated parser from recognising the tokens in the comma-separated list provided. Note that not all language/API combinations respect this exclusion list.

The following values can be used with this option:

CDSect

Exclude character data (CDATA) sections. CDATA sections begin with the string "<![CDATA[" and end with "]]>".

CharRef

Exclude character references. Character references look like (The space character) or A (The character 'A').

Comment

Exclude comments. XML comments begin with "<!--" and end with "-->".

PI

Exclude processing instructions. Processing instructions begin with "<?name" (where name may not be "XML" in any case) and end with "?>".

doctypedecl

Exclude document type declarations. Document type declarations begin with "<!DOCTYPE" and end with ">".

--help

Display a description of all the valid options accepted by the program.

--include

Add features to the generated parser by passing a comma-separated list of features here.

The following values can be used with this option:

entities

Include all the entities defined in the input and cause them to be expanded during the parse.

--input

The file to read the specification of the XML vocabulary from. If the filename is "-" this causes the input to be read from standard input.

This option is mandatory.

--input-type

Set the input type.

The following values can be used with this option:

dtd

The input file is a DTD.

The default value for this option is "dtd"

--output

Set the directory to which output will be written.

--output-api

Some output modules differ only from others by the API the generated parser provides. Use this argument to select the API you wish.

The following values can be used with this option:

binding

A data binding API. (for use with "java")

sax1

The SAX1 API (for use with "java")

sax2

The SAX2 API (for use with "java")

--output-language

The language you wish the generated parser to be provided in.

The following values can be used with this option:

dtd

The XML DTD language.

java

The Java language. (requires an API to be selected)

xslt

The XSLT language.

This option is mandatory.

--output-package

For output languages which have the notion of a package (e.g. Java) or module (e.g. Perl), specify the package/module in which the output code will be placed.

--verbose

Increase the amount of information output while the program is running.

--version

Display the version of the program and then exit.

--vocabulary

A name for the vocabulary of XML that the generated parser will parse.

The default value for this option is "XML"

Return Value

2: EXIT_FAIL_ARGS

The program failed while processing the command line arguments.

1: EXIT_FAIL_L10N

The program failed during the localisation phase.

0: EXIT_SUCCESS

The program exited successfully.

3: EXIT_FAIL_INPUT

The program failed while parsing the input.

4: EXIT_FAIL_CODE_GEN

The program failed while generating code.

Diagnostics

The following is a list of all possible messages that exemplar can produce. For each entry there is a symbolic name which corresponds to the variable in the Message class which holds this diagnostic message. The text portion after the symbolic name is the diagnostic message currently asssociated with the symbolic name. These messages may vary with different environmental settings (locales for example).

ANT_LOCALISATION_ERROR

Ant localisation error.

ASSERTION_MESSAGE

%s which was called from %s

ATTEMPTING_TO_PARSE

Attempting to parse %s as a %s

COPYRIGHT

exemplar 0.1.1 Copyright (c) 2003-2007 Conor McDermottroe. All rights reserved. See LICENSE and README files for licensing information.

DEBUG_CLASS_AND_METHOD

[%s.%s()]

DTDLEXER_ILLEGAL_CHARACTER

Illegal character

DTDLEXER_INPUT_NOT_FOUND

DTD file "%s" not found.

DTDLEXER_INVALID_CONDITIONAL_SECTION

Invalid conditional section type.

DTDLEXER_INVALID_TEXTDECL

Invalid text declaration

DTDLEXER_UNKNOWN_CHARACTER

Unknown character

DTDPARSER_THREW_EXCEPTION

DTD parsing threw an exception.

DTDPEDECLTABLE

PEDeclTable: %d immediate value parameter entity declarations, %d URI-referenced parameter entity declarations.";

DTDPEEXCEPTION

%s [near %s]

DTDPE_GARBAGE_AFTER_SYSTEMLITERAL

Garbage after SystemLiteral.

DTDPE_INVALID_PEDECL

Invalid parameter entity declaration.

DTDPE_INVALID_PUBIDLITERAL

Invalid PubidLiteral

DTDPE_INVALID_SYSTEMLITERAL

Invalid SystemLiteral

DTDPE_UNDECLARED_PE

Reference to undeclared parameter entity: %s

DTDPE_UNRESOLVED_PE_REF

Unresolved parameter entity reference.

EXCEPTION_NO_MESSAGE

<No Message>

FILE_WRITE_FAILED

Failed to write to file "%s"

FILE_WRITE_IO_EXCEPTION

IOException thrown when writing to file "%s"

GENERIC_SECURITY_EXCEPTION

SecurityException thrown when trying to access %s in %s.

GEN_NO_SUCH_DIRECTORY

No such directory: %s

IGNORING_ERROR

Ignoring error.

IGNORING_EXCEPTION

Ignoring exception.

LANGUAGE_DTD

The XML DTD language.

LANGUAGE_JAVA

The Java language.

LANGUAGE_XSLT

The XSLT language.

LEXEREXCEPTION

%s [%s]

LOCALISATION_ERROR

Localisation error: %s

MALFORMED_CHAR_REF

Malformed character reference: %s

MANDATORY_OPTIONS_NOT_SET

Some mandatory options were not set.

MESSAGE_EXTRA_BUNDLE_ENTRY

No field for %s in %s.

MESSAGE_MISSING

Localisation error, message missing: %s

MISSING_MANDATORY_OPTION

Mandatory option "%s" was not set.

OPTIONS_INITIALISING

Initialising options.

OPTIONS_NO_SUCH_OPTION

No such option "%s"

OPTION_DEFAULT

The default value for this option is "%s"

OPTION_ENUM_ARGS_HEADER

The following values can be used with this option:

OPTION_IS_MANDATORY

This option is mandatory.

OPTION_LANGUAGE_OF_API

(for use with "%s")

OPTION_LANGUAGE_REQUIRES_API

(requires an API to be selected)

SOURCE_GENERATOR_DOCTYPE_NULL

doctype may not be null.

SOURCE_GENERATOR_LANGUAGE_NULL

language may not be null.

SOURCE_GENERATOR_THREW_EXCEPTION

The underlying parser generator threw an exception.

UI_PROGRESS_DONE

Done.

UI_PROGRESS_FAILED_TO_CREATE_OUTPUT

Failed to create output files.

UI_PROGRESS_FINISHED_TIME

Finished in 12,345.679 seconds.

UI_PROGRESS_GENERATING_PARSER

Generating parser...

UI_PROGRESS_INPUT_PARSE_FAILED

Failed to correctly parse the input file.

UI_PROGRESS_OPTIONS

Processing options...

UI_PROGRESS_PARSING_INPUT

Parsing input file %s

UNREACHABLE_CODE_REACHED

Unreachable code reached.

UNSUPPORTED_INPUT_TYPE

Unsupported input type.

UNTERMINATED_REF

Unterminated character reference.

XMLDOCTYPE_ORPHAN_ATTLIST

Attribute list with no associated element: %s

XMLDOCTYPE_UNSUPPORTED_FEATURE

Unsupported feature.

XMLDOCTYPE_XMLOBJECT_IN_MARKUPDECLS

Forbidden XMLObject in the list of markup declarations: %s

XMLOBJECT_NOT_CONFIGURED

XMLObject not configured.

XMLPARSER_LOAD_CODE_FRAGMENT_FAILED

Failed to load code fragment.

Examples

In all of the following examples the JAR file containing  Exemplar is assumed to be at  /path/to/Exemplar.jar.

Java SAX 1 parser (from the CLI)

To generate a SAX 1.0 parser in the Java language from a vocabulary called Foo with a DTD describing Foo in the file /path/to/foo.dtd and to place the generated source code in /my/output/directory.

java -jar /path/to/Exemplar.jar --input /path/to/foo.dtd --output /my/output/directory --output-api SAX1 --output-language Java --vocabulary Foo

Java SAX 1 parser (from Ant)

To generate a SAX 1.0 parser in the Java language from a vocabulary called Foo with a DTD describing Foo in the file /path/to/foo.dtd and to place the generated source code in /my/output/directory.



<taskdef   name="exemplar"
           classname="com.mcdermottroe.exemplar.ui.ant.Task"
/>
<exemplar  input="/path/to/foo.dtd"
           output="/my/output/directory"
           output_api="SAX1"
           output_language="Java"
           vocabulary="Foo"
/>

Bugs

Known Bugs

All known bugs are tracked in the The Exemplar Issue Tracker, check there for the latest list of known bugs.

What to do if you find a bug

If you find a bug, try to reproduce it. The easier it is to reproduce, the easier it is to fix. File a bug report via The Exemplar Issue Tracker with as much information as possible. The following is the minimum I recommend.

  1. The JVM you're using.

  2. Where you got Exemplar and how you built it (if you did).

  3. The output of Exemplar when you run it with the --help flag.

  4. The input that you need to use to reproduce the bug.

  5. The output (if any) when the bug has been tickled.

  6. The output (if different) when you run  Exemplar with the --debug flag.

Bugs are usually processed according to the following order of precedence.

  1. Bugs that are submitted with code to fix the error and JUnit tests to ensure that the bug won't sneak back into the code in the future.

  2. Bugs that are submitted with code to fix the error.

  3. Bugs that are submitted with JUnit code to reproduce the error.

  4. Bugs that are submitted with instructions on how to reproduce the error.

  5. Bugs that can be reproduced..

  6. Bugs that are intermittent and hard to pin down.

  7. Bugs that have fixes supplied under a license incompatible with that of  Exemplar.

See Also

CUP(1)

[XML] Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, and Eve Maler. 6th October 2000. Extensible Markup Language (XML) 1.0. Second Edition. W3C. http://www.w3.org/TR/2000/REC-xml-20001006 .

History

Exemplar began life as a university project submitted during the spring of 2004 by Conor McDermottroe as part of the final examinations for a BA (Mod.) Computer Science at Trinity College, Dublin. The curious may request a copy of the software and the accompanying report from:


Department of Computer Science,
O'Reilly Institute,
Trinity College,
Dublin 2,
Ireland.

The project then lay dormant for several months before being restarted with the intention of releasing it as an Open Source project. It went through some extensive refactoring to make it easier and more pleasant to modify. Of course, this should probably have been done while it was a university project, but unfortunately it was not possible to declare that it would be delivered "when it's ready". :-)

As of the 6th of October 2006, Exemplar has been an Open Source project available for all interested parties. A brief description of all known versions of the project is laid out below.

Version Numbering

Exemplar's version numbers are of the form A.B or A.B.C where A is a major number signifying a major milestone or a fundamental change in the behaviour of Exemplar, B is a release intended for use by end-users and C is a milestone number marking internal progress. Hence 0.1 is the first release version of the software and 0.1.0 is the same piece of software with any modifications made after the release but before the first milestone. In general it is not reccommended to use an A.B.C version of the program unless you wish to help debug it.

University Project

The initial version from which the software was derived.

0.0.1 (2006-02-19)

Extensive refactoring work which changed almost every part of the code. The purpose of the work was to allow easier modification of the project by third parties.

0.0.2 (2006-04-18)

A large "clean-up" of the source code to repair any damage caused by the refactoring and to remove as many hacks from the code base as possible.

0.0.3 (2006-08-06)

Completion of the skeleton of the testing framework which was started near the milestone of 0.0.1. This covers basic JUnit testing, some Checkstyle checks and the beginning of a framework for testing the program by running it with all option permutations.

0.0.4 (2006-09-08)

Finishing work on the initial documentation. This included many JavaDoc improvements, the completion of this manual and the beginning of the Developer's Handbook.

0.0.5 (2006-10-05)

All the bugs which were showstoppers for the 0.1 release were fixed.

0.0.6 (2006-10-06)

Finished work on the website, mailing lists and issue tracker.

0.1 (2006-10-06)

Initial public release.

Copyright And License

The following copyright statement and license applies to this documentation only. For the license for the program itself, see the LICENSE and README files which accompany the distribution.

Copyright (c) 2005, 2006, 2007 Conor McDermottroe. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. Neither the name of the author nor the names of any contributors to the software may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.