Michael G. Noll

Applied Research. Big Data. Distributed Systems. Open Source.

The Metadata Triumvirate: Social Annotations, Anchor Texts and Search Queries

My paper “The Metadata Triumvirate: Social Annotations, Anchor Texts and Search Queries” has been accepted for publication and presentation at this year’s IEEE/WIC/ACM International Conference on Web Intelligence (WI) which will be held in Sydney, Australia, from December 09 - 12, 2008.

Abstract

In this paper, we study and compare three different but related types of “metadata” about web documents: social annotations provided by readers of web documents, hyperlink anchor text provided by authors of web documents, and search queries of users trying to find web documents. We introduce a large research data set called CABS120k08 which we have created for this study from a variety of information sources such as AOL500k, the Open Directory Project, del.icio.us/Yahoo!, Google and the WWW in general. We use this data set to investigate several characteristics of said metadata including length, novelty, diversity, and similarity and discuss theoretical and practical implications.

Full Paper & Presentation

Related Links

Comments