GitHub : MoneyBall meets Open Source?
One of the fundamental problems of hiring good developers is telling the good ones from the okay ones from the total duffers. The industry as a whole has developed some crude indicators, including batteries of interviews, extensive technical tests and gut instinct.
However the most compelling indicator of programming capability remains the programmer’s opus. For years this was not accessible except at a gross level. Essentially you could look at commercial product they had contributed to. This didn’t really cut the mustard because on a large project it was easy to fudge the issues around what a individual’s specific contribution was. We are all familiar with the lurkers on large projects. Worse we know both intuitively and from past experience that 20% of the people generate 80% of the bugs.
Open Source helped enormously and if you cared to take the time you could examine a body of work for an individual. However not everyone contributes to Open Source projects because the friction of engagement was relatively high and the slow promotion from viewer to patch submitter to committer was a total turn off for most people.
GitHub changes all this.
GitHub combines the ridiculously easy forking, branching and merging capabilities of GIT with a hosted SaaS solution for sharing code in a social context. The result has been that since 2008 GIT and GITHub have become the defacto standard for Open Source projects. Despite a number of holdouts (notably the Apache Foundation) the rest of the world has embraced the utility of social coding.
What does this mean? For the first time in the history of the sector we have an enormous body of active code that is accessible via a public API. More significantly, that API is not just for the source code, its designed for the meta-data around the source code, the users, the events, the issues and the organisations. What this offers us is a set of raw data that we can analyse to understand not only who is good, who is bad and who is indifferent, we can reverse engineer the key indicators to allow us to identify the different classes of programmer from the lurkers and bozos.
Michael Lewis‘s Moneyball taught us that in the absence of a solid methodology people go with gut and those gut instincts are almost always skewed or outright wrong. Daniel Kahnman confirmed that analysis in “Thinking, Fast and Slow“, where one of his key discoveries was that a good heuristic, applied consistently will always beat the opinions of experts. The power of MoneyBall was that it showed how good historical data can be used to give a indication of future performance and how performance may not be related to the indicators we have used historically.
So we have the raw data, constantly updated by GITHub. We know that the major league of software is open source, so if you are not committing in GitHub are you even at the show? Finally we know that with A16Z’s investment of 100m dollars there is going to be some additonal commercialisation. Its seems an obvious play to use the vast array of analysable data to monetise the community on GitHub in a variety of ways.
So can GitHub become the MoneyBall of Software? Can we develop meaningful heuristics that can identify not just ninja rockstars but good team players and players with the equivalent of a consistent “on base percentage“.
I think that data is in GitHub, we just have to mine it.