-->
MeeGo Network Finland
8
Apr
For the last month or so, most of my time has involved organizing MeeGo Summit FI. At the same time I have been preparing my PhD research about MeeGo ecosystem. I know! Some members of MeeGo community do not like the word (ecosystem) for various reasons. Some are satisfied with the term community and don’t see any need to call MeeGo project an ecosystem. I’m not going to discuss this rather semantic issue here. Instead I will focus on more funny and enterntaining issues.
A few days ago I stumbled upon ‘gource’ in one of my PhD seminars. Gource is an amazing command-line tool for visualizing commit history in a git-based code projects. Currently gource supports also SVN and Mercurial (http://code.google.com/p/gource/w/list).
While videos generated with gource might at first seem useless and fancy time consuming experiments with video codecs, that is not the whole truth. Gource can be useful at least in two ways. Firstly, it allows you to see what areas of the project are active in an easy to understand way. Secondly, it shows whether there is community around a whole project or just aspects of it. I’ll discuss that in more details later. I wanted to test gource and see what can be done with it. This gource experiment became more fun that I expected. Just for fun, like Linus once said in cover of his book. Well, he has said quite a lot but that one fits here. Let’s get dirty!
I had to do some preparations in my Ubuntu 10.10. Applications needed are git, ffmpeg with x264 libs and of course gource. More information can be found here. Once the preparations have been done, we can get to business. What are the steps to take? There are three steps to take.
Git is “a free & open source, distributed version control system designed to handle everything from small to very large projects with speed and efficiency”(source). Ok, that’s enough of git. Obviously we need the source from which logs will be generated. The logs will be given to gource as input. I’ve selected one meego project, tracker,as target for this time. So i just cloned the repo:
Git does some fancy stuff and finally you should have your own clone of the project, this time tracker. Now go to the project folder (cd tracker).
Then you need to create logs from the source. In the example I have generated a small log for the last 5 days. Simply run command (in the project folder):
The last “> git.log” directs the output to file. Voilà! You can check what the log looks like with command ‘less git.log’. Ok, now we have the logs…now what?
Next you use the log file as input for gource. I will not go to every detail about every option in the below example, but some let’s take look at some of them. Ok, here’s the command:
Here’s a few pointers about the options to get you started:
More details can be found from the gource website or from command line (run ‘gource -H’). I did a lot of experiments to get an idea what can be done and how. I encourage you to do the same. It’s fun, yet time consuming since the rendering might take time.
After some little post-production, the ‘output.mp4′ looks like the video below, which I have named “Life of an application“:
The above is something pretty normal use-case for Gource. It was made for visualizing source code trees and commits. I just started wondering what would IRC logs/discussions look like if I would input logs into gource…I know, the whole idea sounds a little weird, but I just had to see it myself. Obviously I would need to ‘emulate’ git log format and to do that I would need to parse ’similar’ logs from IRC logs. Yet some creativity should be used.
Quick look at the logs generated from Git got me started. Let’s take an example from git log which as generated with formatting option: “–pretty=format:user:%aN%n%at –reverse –raw –encoding=UTF-8 –no-renames –since=”5 days ago”. That ouputs the following:
user:Yin Kangkai 1270704848 :100644 100644 aec16a6... e3fddb2... M patches/pch_dma.patch :100644 100644 2b2bc37... f9ea573... M patches/pch_usbdev.patch
First line is obviously user name. Second line is unix timestamp. Third and fourth lines are a bit more tricky. I have no glue what the two first strings (”:100644 100644″) are in the third/fourth line, so I just thought: “might as well copy that to my logs directly. Let’s see what happens”. The third and fourth strings vary all the time. Ok, so they must be some sort of identifiers (I guessed). Letter ‘M’ might be ‘Modified’. Other letters seen in git logs were ‘A’ for Added(?) and ‘D’ for delete(?). So far so good. The last string is the folder and filename. Ok, let’s see in what format IRC logs could be parsed to :)
Obviously I can get the IRC handles from each message. I can generate timestamps for each IRC message, since the log file tells me the date (in the beginning) and time for each line in human format. I just have to reverse the date to timestamp. I ignored the varying mumbo-jumbo part (”aec16a6… e3fddb2…”). Instead I just copied same values to every line, thinking I might perhaps generate some random content to those strings later. Next one I decided to put ‘M’ as modified, since putting ‘A’ as added, would have grown the tree in the middle. The tree in the middle is channel (#meego) where everyone ‘commits’. By that logic I placed channel name to ‘folder’ and append message after that. This gives me the following:
user:MrPingouin 1296603060 :100644 100644 aec16a6... e3fddb2... M meego/test user:Venemo 1296603120 :100644 100644 aec16a6... e3fddb2... M meego/test user:MrPingouin 1296603120 :100644 100644 aec16a6... e3fddb2... M meego/test user:CosmoHill 1296603240 :100644 100644 aec16a6... e3fddb2... M meego/test ... and many more
Ok, I did that with some butt-ugly perl scripting. Let’s try… Bummer! That’s no good, it does not work. Gource gives me: “gource: no commits found”. Obviously something is still missing. I should have read Gource manuals (who reads manuals? I should have), because gource accepts custom log format. Besides, at this point I found confirmation for my guessing about ‘M’s, ‘A’s and ‘D’s. Those are as I suspected, modified, added and deleted. Anyway the custom log format is as follows:
That looks sane and simple. I don’t need the color stuff and it’s optional for gource too, so I skipped that. After I adjusted my miserable parser, it generated:
1296603060|MrPingouin|M|meego/line 1296603120|Venemo|M|meego/line 1296603120|MrPingouin|M|meego/line 1296603240|CosmoHill|M|meego/line ... and many more
Let’s test again. Yay! Now gource accepted my logs and the result is…well at least something. Not much though, but it shows people posting messages (commits) to ‘#meego’ at the center and fading away after a while. Here’s two versions of it: raw ‘footage’ with black bg and one with different time scaling (a bit weird). First the ‘raw’ version (in new window youtube)
Artistic version with different time scaling and a few days combined (in new window youtube).
While that looked nice, I wasn’t satisfied and I had still room for more ideas. What if I combine two or three channel logs from same time period and use that as gource input? That might be more fun. I took a look at the logs, both #meego and #meego-dev channels. I took one log file from each, same time period of course. Problem is that #meego-dev channel logs are mostly full of just joinings and departures. Nevertheless, I copied my perl script to another file and manipulated it (again ugly, but works) to combine those two logs. Here is a snip of it:
1298031840|raghum|M|meego.01-18-2011/line 1298059680|Stskeeps|M|meego.01-18-2011/line 1298031480|sanjeev1|M|meego-dev.01-18-2011/line 1298057220|lcuk|M|meego.01-18-2011/line ...and many more
As you might notice, I used ‘folder’ (log file name) to separate channels from each other. Then I used combined log file as input for gource. The output does not look ‘good’, and the output is much similar as with one channel, so let’s forget that. If the two channels would have had even a little more discussions, it would look great. In addition to that gource located the items, #meego and #meego-div, quite close to each other so it was hard to see where the discussion took place.
One more thing is missing and that really bugs (or itches) me. I did not find a way to include direct messages from one person to another, while still staying on the channel. Direct messages refers to user putting handle of another person in the beginning of message line. In other words, how to make users ‘commit’ to other users in gource. Parsing those connections from irc logs is pretty trivial. One way to get that would of course be making a patch to gource. A patch that enables targeting other ‘committers’. If you come up with a solution, let me know :) I did also run some tests with longer time periods, like 5-10 days by just parsing the logs files into one. Now that IRC has been ‘discovered’ without making patches to gource, it’s time to move on. What else could be visualized with this tool?
Mailing lists of course! They are often trees,which have leafs. Once again I adjusted my miserable parser to do another thing. After a few tests, parser read one month (December 2010) from MeeGo archives and emulated git logs. In this case I decided to go for similar logs what git does, instead of using custom logs like before. Reason is that, custom log offers less opportunities to build leafs.
Anyway, here’s a sample:
user:LarryMathews 1291173184 :000000 100755 000000... 9117318... A how_to_upgrad/002396.html
user:AndreaGrandi 1291186135 :000000 100755 000000... 9118613... A Dublin_MeeGo_Weekend/002397.html user:DaveNeary 1291191407 :000000 100755 000000... 9119140... A Dublin_MeeGo_Weekend/002398.html
Obvious first line is modified name taken from email sender field. Second line is timestamp, which was created from email date and time. Third line is rather creative combination. It includes from left to right:
Once again I used parser output as input for gource. Result was somewhat satisfactory (in new window yuotube):
It still lacks some of the leafs, since my *cough* miserable *cough* parser is not capable of linking leafs correctly. It should make the references using the third string in third line like in git logs…but it does not. Instead it just uses static string. Nevertheless, an example was created.
Another thing that I would have liked to do, is vizualizing bugtracker :) I have the SQL query which would pull the needed data out, but not yet access to make SQL queries to bugs.meego.com. But that can be arranged, so I’ve heard. Better leave now and focus on something else…
Most likely no use at all, at least not in general. But is was fun and educational for me! And that is what counts in my world. I did test out what IRC visualization with gource can produce and what is lacking. But seriously, code visualizations like the one above could be useful to get new programmers familiar with typical ‘life cycles of an application’.
For those that have not been involved in longer software projects, it might be educational to see what actually happens in projects in the long run. Eyeballing commit history or browsing the source code tree might be less eyeopening and boring.
Sometimes the source is just a big lump, sometimes the project seems to spread around the world, sometimes there is just one person doing stuff alone, sometimes swarming occurs (multiple contributors at the same time), and so on. Lumps, bursted and swarming are pseudo terms that just popped into my mind, so don’t take those seriously.
And the IRC part? While you were looking at the first clip, did you see around 490-540 IRC handles there? No, you saw exactly 34 unique handles. Yet community statistics (Feb 2011) say that 490-540 log in to #meego channel every day. That is not a lie, they are there, but say nothing. They do not contribute to discussion, but may contribute otherwise. The point is that looking at pure statistics may give you a false feeling of activity or overall situation. The same applies to email archives as well.
This thing with gource started with the idea: “I have to try that on some git tree.” A little more was tested than just that. Nevertheless, useful or not, part of my curiosity was satisfied…until I find another thing…oh look! new gadget!….
Design + Coded by rkcorp
Developed with Scam letter Archive
with associated with cheap web hosting

Responsed To This Post
Subsribes to this topic Comment Rss or TrackBack