Martin Michlmayr
Martin Michlmayr

I'm a member of Debian, and I work for HP as an Open Source Community Expert. The opinions expressed here are mine.

Subscribe to the RSS feed of this journal.

SourceForge and data collection

I started writing some scripts to extract various data from SourceForge which I'm going to use for some academic research. Unfortunately, I had to discover that they don't provide mbox files of their mailing list archives so I wrote a script to snarf the postings from the web and create an mbox (of course some headers are missing, but they are fortunately not crucial). The FAQ says they don't provide mbox files on the web because of e-mail address harvesters, but they could at least provide them to registered users. I'm also a registered user now. I'm not totally comfortable with this given SourceForge's increasingly proprietary nature, but SourceForge currently has more data to offer to a researcher than services like Savannah. I expect this is going to change eventually, though.

Fri, 19 Sep 2003; 01:12 — unipermanent link