Announcing the Open Source Sweble Wikitext Parser v1.0

We are happy to announce the general availability of the first public release of the Sweble Wikitext parser, available from http://sweble.org.

The Sweble Wikitext parser

  • can parse all complex Wikitext, incl. tables and templates
  • produces a real abstract syntax tree (AST); a DOM will follow soon
  • is open source made available under the Apache Software License 2.0
  • is written in Java utilizing only permissively licensed libraries

You can find all relevant information and code at http://sweble.org – this also includes demos, in particular the CrystalBall demo, which lets you query a Wikipedia snapshot using XQuery. (The underlying storage mechanism is not particularly well-performing, so you may have to wait a little if load is high.)

The Sweble Wikitext parser intends to be a complete parser for Wikitext. That said, plenty of work remains to be done. Wikitext, as implemented through the MediaWiki engine, has ties to many components that aren’t strictly part of the language, most notably the parser functions, of which we have implemented only a subset.

At this stage, we are hoping for your help. You can help us by

  • playing with the CrystalBall demo and pointing out to us wiki pages that look particularly bad or faulty
  • simply using the parser in your projects and telling us what works and what doesn’t (bug reports!)
  • getting involved in the open source project by contributing code, documentation, and good humor

If you have questions, please don’t hesitate to use the sweble.org facilities or send email to the main implementor, Hannes Dohrn.

Brought to you by the Open Source Research Group at the University of Erlangen,