Final Thesis: Tree-based Edit Analysis of Wiki Articles

Abstract: With the aim to improve editing support, this thesis examines frequent patterns in the English Wikipedia’s edit history. These patterns are assumed to resemble edit transformations Wikipedia authors had in mind when changing Wikipedia articles and might be of interest to facilitate text refactorings. The transformations are expected to have a complex structure with multiple relations between elements of Wikipedia articles and edit operations, which cannot be trivially modeled. This thesis tackles this problem by encoding the information, which is hidden in the edit history, in graphs, representing edit operations and elements of Wikipedia articles as graph nodes and relations as links between these nodes. The resulting edit script graphs are mined for frequent subgraphs with the intention of retrieving interesting frequent patterns in the form of graphs. As results we list, visualize and analyze the discovered frequent patterns and create pattern clusters, which resemble real-world text transformations at different levels of abstraction.

Keywords: Wikipedia, Wiki, Author Behavior, Edit Pattern, Diff, Mining

PDFs:  Master Thesis, Work Description

Reference: Roland Vallery. Tree-based Edit Analysis of Wiki Articles. Master Thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg: 2016.