Michael G. Noll

Applied Research. Big Data. Distributed Systems. Open Source.

Exploring Social Annotations for Web Document Classification

My paper “Exploring Social Annotations for Web Document Classification” has been accepted for publication and presentation at the Semantic Web track of this year’s ACM Symposium on Applied Computing (SAC), which will be held in Fortaleza, Ceará, Brazil, from March 16 - 20, 2008.

Extended Abstract

Social annotation via collaborative tagging describes the process by which many users add metadata in the form of unstructured keywords to shared content. The recent success of web services with such a tagging component like del.icio.us or Flickr has provided a plethora of user-supplied metadata about web content for everyone to leverage. In this paper, we explore and analyze social annotations and tagging with regard to their usefulness for web document classification. We are interested in finding out which kinds of documents are annotated more by end users than others, how users tend to annotate these documents, and in particular how this user-generated folksonomy compares with a top-down taxonomy maintained by classification experts for the same set of documents. We describe what can be deduced from the results for further research and development in the areas of document classification and information retrieval. Our work is based on large sets of real-world data, comprising a random sample of 100,000 web documents combined with data retrieved from the social bookmarking service del.icio.us, the Open Directory catalogue, and the search engine Google. The data set of our experiments is freely available for research.


You can download the paper as a PDF document:

Related Links