Final Thesis: Measuring Patch-Flow at Google

Abstract: In the industrial domain, software development is a highly collaborative work involving different contributing teams. But there is not yet a way to quantify the collaboration between organizational units within a software developing company. However, information about this collaboration is latent in software repositories and has not been defined yet.

We mined Google’s software repository and identified all commits which are assigned to projects of organizational units the patch author does not belong to. We call this phenomena of collaboration beyond organizational borders patch-flow. This work introduces a graph-based metric to quantify this patch-flow. We developed a tool that is able to crawl in Google’s repository and collected patches of 2,500 Google developers in the years 2007, 2009, 2011, and 2013. Due to the missing historical information about organizational unit membership of developers, we provided a clustering approach to assign all developers to orgunits. Because the Google internal data has not been released by now, we crawled and analyzed the Chromium project.

Using the Chromium data we were able to apply the patch-flow metric and quantify collaboration over organizational unit boundaries, although the used data source is only suitable to a limited extent. The clustering approach has to be validated.

Keywords: collaboration, mining software repositories, google, orgunit, patch, flow

PDFs: Master Thesis, Work Description

Reference: Michael Dorner. Measuring Patch-Flow at Google. Master Thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg: 2015.