exemplar — An XML parser generator
exemplar [
--debug
] [
--debug-level
debug_level
] [
--exclude
{CDSect,CharRef,Comment,PI,doctypedecl}
] [
--help
] [
--include
{entities}
]
--input
input
[
--input-type
{dtd}
] [
--output
output
] [
--output-api
{binding,sax1,sax2}
]
--output-language
{dtd,java,xslt}
[
--output-package
output_package
] [
--verbose
] [
--version
] [
--vocabulary
vocabulary
]
Exemplar is a tool for processing XML vocabulary descriptions (currently just DTDs) into useful code.
Exemplar was originally designed to produce the smallest possible XML parser for a given vocabulary. The idea was that by restricting the vocabulary of XML one could "cut corners" within the parser. This was found to be only partially true. To generate smaller parsers, one only needs to know what features of XML the vocabulary makes use of. For example, whether or not a vocabulary defines attributes for some elements is much more important than knowing the exact content models of the elements in the vocabulary. Exemplar can create SAX 1.0 parsers that are approximately 5-10KB.
Exemplar fully parses vocabulary descriptions and creates a model of the vocabulary. The current goals are to use the model of the vocabulary to perform tasks other than creating small parsers, such as creating faster parsers or auto-generating the code needed to make full use of a parsed XML document.
The following options may be passed to Exemplar on the command line.
Turn up the level of debugging, higher numbers produce more copious output.
The default value for this option is "1"
Prevent the generated parser from recognising the tokens in the comma-separated list provided. Note that not all language/API combinations respect this exclusion list.
The following values can be used with this option:
Exclude character data (CDATA) sections. CDATA sections begin with the string "<![CDATA[" and end with "]]>".
Exclude character references. Character references look like (The space character) or A (The character 'A').
Exclude comments. XML comments begin with "<!--" and end with "-->".
Exclude processing instructions. Processing instructions begin with "<?name" (where name may not be "XML" in any case) and end with "?>".
Exclude document type declarations. Document type declarations begin with "<!DOCTYPE" and end with ">".
Add features to the generated parser by passing a comma-separated list of features here.
The following values can be used with this option:
Include all the entities defined in the input and cause them to be expanded during the parse.
The file to read the specification of the XML vocabulary from. If the filename is "-" this causes the input to be read from standard input.
This option is mandatory.
Set the input type.
The following values can be used with this option:
The input file is a DTD.
The default value for this option is "dtd"
Some output modules differ only from others by the API the generated parser provides. Use this argument to select the API you wish.
The following values can be used with this option:
A data binding API. (for use with "java")
The SAX1 API (for use with "java")
The SAX2 API (for use with "java")
The language you wish the generated parser to be provided in.
The following values can be used with this option:
The XML DTD language.
The Java language. (requires an API to be selected)
The XSLT language.
This option is mandatory.
The following is a list of all possible messages that exemplar can produce. For each entry there is a symbolic name which corresponds to the variable in the Message class which holds this diagnostic message. The text portion after the symbolic name is the diagnostic message currently asssociated with the symbolic name. These messages may vary with different environmental settings (locales for example).
exemplar 0.1.1 Copyright (c) 2003-2007 Conor McDermottroe. All rights reserved. See LICENSE and README files for licensing information.
In all of the following examples the JAR file containing
Exemplar is assumed to be at
/path/to/Exemplar.jar.
To generate a SAX 1.0 parser in the Java language from a
vocabulary called Foo with a DTD describing Foo in the file
/path/to/foo.dtd and to place the
generated source code in /my/output/directory.
java -jar /path/to/Exemplar.jar --input
/path/to/foo.dtd --output /my/output/directory
--output-api SAX1 --output-language Java --vocabulary
Foo
To generate a SAX 1.0 parser in the Java language from a
vocabulary called Foo with a DTD describing Foo in the file
/path/to/foo.dtd and to place the
generated source code in /my/output/directory.
<taskdef name="exemplar"
classname="com.mcdermottroe.exemplar.ui.ant.Task"
/>
<exemplar input="/path/to/foo.dtd"
output="/my/output/directory"
output_api="SAX1"
output_language="Java"
vocabulary="Foo"
/>
All known bugs are tracked in the The Exemplar Issue Tracker, check there for the latest list of known bugs.
If you find a bug, try to reproduce it. The easier it is to reproduce, the easier it is to fix. File a bug report via The Exemplar Issue Tracker with as much information as possible. The following is the minimum I recommend.
The JVM you're using.
Where you got Exemplar and how you built it (if you did).
The output of Exemplar when you run it with the --help flag.
The input that you need to use to reproduce the bug.
The output (if any) when the bug has been tickled.
The output (if different) when you run Exemplar with the --debug flag.
Bugs are usually processed according to the following order of precedence.
Bugs that are submitted with code to fix the error and JUnit tests to ensure that the bug won't sneak back into the code in the future.
Bugs that are submitted with code to fix the error.
Bugs that are submitted with JUnit code to reproduce the error.
Bugs that are submitted with instructions on how to reproduce the error.
Bugs that can be reproduced..
Bugs that are intermittent and hard to pin down.
Bugs that have fixes supplied under a license incompatible with that of Exemplar.
CUP(1)
[XML] 6th October 2000. Extensible Markup Language (XML) 1.0. Second Edition. W3C. http://www.w3.org/TR/2000/REC-xml-20001006 .
Exemplar began life as a university project submitted during the spring of 2004 by Conor McDermottroe as part of the final examinations for a BA (Mod.) Computer Science at Trinity College, Dublin. The curious may request a copy of the software and the accompanying report from:
Department of Computer Science,
O'Reilly Institute,
Trinity College,
Dublin 2,
Ireland.
The project then lay dormant for several months before being restarted with the intention of releasing it as an Open Source project. It went through some extensive refactoring to make it easier and more pleasant to modify. Of course, this should probably have been done while it was a university project, but unfortunately it was not possible to declare that it would be delivered "when it's ready". :-)
As of the 6th of October 2006, Exemplar has been an Open Source project available for all interested parties. A brief description of all known versions of the project is laid out below.
Exemplar's version numbers are of the form A.B or A.B.C where A is a major number signifying a major milestone or a fundamental change in the behaviour of Exemplar, B is a release intended for use by end-users and C is a milestone number marking internal progress. Hence 0.1 is the first release version of the software and 0.1.0 is the same piece of software with any modifications made after the release but before the first milestone. In general it is not reccommended to use an A.B.C version of the program unless you wish to help debug it.
Extensive refactoring work which changed almost every part of the code. The purpose of the work was to allow easier modification of the project by third parties.
A large "clean-up" of the source code to repair any damage caused by the refactoring and to remove as many hacks from the code base as possible.
Completion of the skeleton of the testing framework which was started near the milestone of 0.0.1. This covers basic JUnit testing, some Checkstyle checks and the beginning of a framework for testing the program by running it with all option permutations.
The following copyright statement and license applies to this documentation only. For the license for the program itself, see the LICENSE and README files which accompany the distribution.
Copyright (c) 2005, 2006, 2007 Conor McDermottroe. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the author nor the names of any contributors to the software may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.