As promised, the DMOZ100k06 research data corpus is now available for download. The corpus is described in my paper Authors vs. Readers: A Comparative Study of Document Metadata and Content in the WWW, for which the corpus was built. Enjoy!

Interested in more? You can subscribe to this blog and follow me on Twitter.