SPEAR Algorithm

From Michael G. Noll

Revision as of 17:02, 10 July 2010 by Michael (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Background

My co-worker Ching-man Au Yeung from University of Southampton and I presented the SPEAR algorithm in our joint paper "Telling Experts from Spammers: Expertise Ranking in Folksonomies" at the ACM SIGIR 2009 Conference in Boston, USA. We especially like to thank our co-authors and PhD supervisors Prof. Christoph Meinel (Hasso Plattner Institute, Germany), Dr. Nicholas Gibbins and Prof. Nigel Shadbolt (both University of Southampton) for their contributions and great feedback.

In a Nutshell

The graph-based SPEAR algorithm (Spamming-resistant Expertise Analysis and Ranking) is a new technique to measure the expertise of users by analyzing their activities. The focus is on the ability of users to find new, high quality information in the Internet. At the same time, the algorithm has been shown to be very resistant to spamming attacks. We based SPEAR on the well-known information retrieval algorithm HITS by Prof. Jon Kleinberg (1999) which is used by search engines like Yahoo! or Google to rank Web pages. We modified SPEAR so that it fits to the characteristics of the Social Web / Web 2.0, and extended it with a second component that integrates the timeline of user activities into the analysis in order to further improve the performance of the algorithm (see figure 1).

The original use case – and the one described in our SIGIR paper – has been to find expert users and high quality websites for a given topic on the social bookmarking service Delicious.com, a Yahoo! company. A great benefit of SPEAR is that it actually returns two very interesting results at the same time: first, a ranked list of expert users; and second, a ranked list of high quality Web documents. So whether you want to find experts for the programming language JavaScript or get the best websites for Photography, SPEAR can help.

The two main elements of the SPEAR algorithm are:

  1. Mutual reinforcement of user expertise and document quality: A user's expertise in a particular topic depends on the quality of the documents he/she has found, and the quality of documents in turn depends on the expertise of user who have found them.
  2. Discoverers vs. followers: Expert users should be discoverers – they tend to be faster than others to identify new and high quality documents. In other words, "the early bird catches the worm" (see also figure 1).

The combination of both elements has the effect that SPEAR favors quality over quantity and that it is quite resistant to today's spamming attacks.


SPEAR-discoverers-followers.png

Figure 1: The SPEAR algorithm gives more credit to early discoverers of new information. How much credit each user receives depends on a so-called credit score function that is supplied as a parameter to the algorithm.

SPEAR-algorithm.png

Figure 2: The main technical components of the actual SPEAR algorithm are a weighted adjacency matrix and two score vectors. The vectors keeps track of the expertise score of users and quality scores for documents, respectively.

Reference implementation

We have released a reference implementation of the SPEAR algorithm, written in the Python programming language. Feel free to play around with it and send us feedback!

http://github.com/quuxlabs/Spear

Here are some quick impressions of its usage:

 >>> import spear
 >>> activities = [
 ... (datetime.datetime(2010,7,1,9,0,0), "alice", "http://www.quuxlabs.com/"),
 ... (datetime.datetime(2010,8,1,12,45,0), "bob", "http://www.quuxlabs.com/"),
 ... ]
 >>> spear_algorithm = spear.Spear(activities)
 >>> expertise_results, quality_results = spear_algorithm.run()

You can also use this library to simulate the HITS algorithm of Jon Kleinberg. Simply supply a credit score function C(x) = 1 to the SPEAR algorithm (see the documentation of the Spear.run() method).

How does everyman benefit?

Since the interaction of users in the Internet is steadily increasing, techniques like SPEAR are very helpful to improve the social networks of users; for example, by recommending interesting and trustful users or by mitigating information overload. The faster and more efficient important information can be shared, the better. On top of that, SPEAR is resistant to spamming attacks and helps to get rid of maliciously published junk content.

Where can SPEAR be used?

We and others think that the SPEAR algorithm can be applied to a rather wide area of domains. On the one hand, SPEAR can help to improve applications that directly focus on users – for example identifying trends or trendsetters on sites like Twitter – or on the information shared between such users. The latter includes buying behavior and customer reviews on online shops like Amazon or music listening preferences on services like Last.fm. In both cases SPEAR can provide additional information that could help to improve the quality of these applications, e.g. to provide more accurate product recommendations on Amazon.

On the other hand, the SPEAR algorithm itself is not restricted to the online world. We imagine, for example, to use SPEAR for estimating the expertise of researchers by analyzing scientific publications. Such publications – whether available as online versions or printed out on paper – provide all the information we need to run SPEAR.

Publications

  • C.-M. Au Yeung, M. G. Noll, N. Gibbins, C. Meinel, N. Shadbolt
    SPEAR: Spamming-resistant Expertise Analysis and Ranking in Collaborative Tagging Systems
    International Journal of Computational Intelligence, Wiley-Blackwell, 2010 (to appear) (Impact Factor: 3.31)
  • M. G. Noll, C.-M. Au Yeung, N. Gibbins, C. Meinel, N. Shadbolt
    Telling Experts from Spammers: Expertise Ranking in Folksonomies
    Proceedings of 32nd ACM SIGIR Conference, Boston, USA, July 2009, pp. 612-619, ISBN 978-1-60558-483-6 (ACM Link, BibTeX)

Selected Press Coverage


Tags: acm, collaborative tagging, delicious, evaluation, expertise, experts, folksonomy, hits, publications, reputation, research, sigir, social annotations, social bookmarking, social web, spam, spammers, spear, web, web2.0