HD-Diff released as part of Sweble 2.0

HD-Diff is a tree-based algorithm to compute the differences between two documents. The algorithm was presented in a paper at the DocEng 2014 conference.

Unlike other tree-based differencing algorithms HD-Diff can look into text nodes, splits them when necessary and produces a very fine-grained edit script. It is especially suited for tree-based text documents (e.g. office documents or WOM v3-based wiki articles) in which changes often happen to the text inside text nodes and not just to the overall document structure.

The reference implementation of the generic HD-Diff algorithm is made available as part of the Sweble 2.0 release in the module hddiff. An adapter for WOM v3 documents is made available in the module hddiff-wom-adapter.

Additional information on the hddiff project can be found at GitHub, on our HD-Diff project page and in our paper.