[Teammetrics-discuss] Git Statistics

Sukhbir Singh sukhbir.in at gmail.com
Mon Jun 27 12:12:48 UTC 2011


Hi,

Along with the deb package, I have started work on parsing Git repositories.

Here are some points that are up for discussion:

1. The configuration for the repositories is stored in
/etc/teammetrics. An example:

    [project-name]
    url = <url-to-git-repository>

I don't think there is anything more to add here, but you can suggest.

2. There is no way to get the logs without cloning the repository. So
first we have to clone the repository, get the logs and then extract
the required information.

Here is how we get the logs:

We use 'subprocess' to call 'git log'. Here is an example from the
'teammeterics' repository for the author Sukhbir, when I call:

    git log --author="Sukhbir Singh" --oneline --pretty=tformat: --numstat

1       1       liststat.py


1       1       liststat.py


11      3       liststat.py
83      83      updatenames.py

First column specifies insertions, second specifies deletions and the
last specifies the names of the files specified.

Now comes the important part. Remember I said that we should include
the lines committed as a metric? Here is what I propose: for each
given author, let us get the total number of insertions and total
number of deletions. The difference of those will be used as a metric.
So if someone is adding lines and deleting very few of them, this
means he is contributing more, right? There is only one problem with
this approach -- sometimes when making edits, we delete more than we
add. Should we bother about this then? If we do, we can't have a
definite metric, then we need to have the metric of insertion and
deletions as distinct metrics. Let's decide this.

3. After we have the data, we push that into a table called gitstat.
And then it's done.

--
Sukhbir.



More information about the Teammetrics-discuss mailing list