Michael G. Noll

Applied Research. Big Data. Distributed Systems. Open Source.

DMOZ100k06: Data Corpus for Research in the Web 2.0

As promised, the DMOZ100k06 research data corpus is now available for download. The corpus is described in my paper Authors vs. Readers: A Comparative Study of Document Metadata and Content in the WWW, for which the corpus was built. Enjoy!