Visualizing version control data


How might a software development team use version control data to augment decision making?

Background information

Version control systems are tools that enable large teams of software developers to collaborate on a single project, track changes to the project, and not interfere with each other’s work. Version control data is not currently used to its full potential. Currently, developers use version control data to prescribe fixes to bugs after they are detected; unfortunately, few products use version control data to predict and prevent bugs before they occur.


Literature review

First, I conducted a comprehensive literature review to learn how version control data is currently used for predictive analytics. That’s how I learned about code churn, an extremely useful metric that is underutilized in current industry solutions.

Code churn:

  • A metric that represents the number of lines of code that were modified, deleted, or added to a repository of source code 

  • Easy to measure using data generated from version control systems. 

  • Measures system volatility and is a useful predictor of software bugs

Early designs

Based on the findings of my literature review, I sketched out ideas for how to best visualize code churn and narrowed down my list of potential solutions to the following two graphs: 


  • Advantage: high information density at the attribute level

  • Trade-off: does not visualize network structure


  • Advantage: visualizes both network structure and key attributes of each node

  • Trade-off: does not visualize changes over time

After sketching, I built out the designs using real data sourced from public Github repositories. Using the command line, I forked my own copy of a public repository and piped the version history into an excel file. From there, I manually encoded the visualizations using the Sketch software platform.


Next steps

I’ve identified two areas where future research is needed:

Design validation

Research question:

  • Which visualization can users most rapidly and accurately read? 

Research method:

  • Remote, unmoderated user experience evaluation using survey software

  • Within-subjects design, each participant reads a graph and answers three questions, repeats until all graphs have been read

  • To create a sense of urgency, participants will have to answer each question before a countdown clock runs out of time

  • Metrics to capture: task completion, time on task, & task accuracy

Formative user research

Research question:

  • How do users currently understand code churn? What kinds of information are most important to users when monitoring code churn?

Research method:

  • One-one-one user interviews with software developers and project managers