Final Thesis: License Confusion on GitHub

Abstract: Today software ecosystems heavily rely on the re-usage of and object level linking to existing code. Allowance for this is given by a wide variety of software licenses. Due to conflicting restrictions imposed by those licenses, some of them cannot be used together in the same project. This work approximates the amount of such license conflicts in the four big pro- gramming language ecosystems, Python, C++, Java and JavaScript, within the code sharing platform GitHub, using information from previous studies about code duplication on the platform. In order to accomplish this, we offer an analysis of the compatibility of open source software licenses, as well as different methods of obtaining the necessary data set. We furthermore adapt, extend and evaluate existing license and text recognition methods to automatically recognize licenses in a highly scaleable fashion. We find the amount of projects to be affected by license conflicts to be between 0.1 and 2.8 percentage. However the difference in numbers between the language ecosystem and the nature of the analysis method which was used in the previous study that produced the data we base our study on, indicates that the number of conflicts is actually much higher.

Keywords: Open source licensing, license conflicts, Github

PDFs: Bachelor Thesis, Work Description

Reference: Yannik Schmidt. License Confusion on GitHub. Bachelor Thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg: 2018.