Michael G. Noll

Applied Research. Big Data. Distributed Systems. Open Source.

SPEAR Algorithm

The SPEAR algorithm is a tool for ranking users in social networks by their expertise and influence within the community.

Background

In 2009, my co-worker Ching-man Au Yeung from University of Southampton and I presented the SPEAR ranking algorithm in our joint paper Telling Experts from Spammers: Expertise Ranking in Folksonomies at the ACM SIGIR 2009 Conference in Boston, USA.

In a Nutshell

The graph-based SPEAR ranking algorithm (Spamming-resistant Expertise Analysis and Ranking) is a new technique to measure the expertise of users by analyzing their activities. The focus is on the ability of users to find new, high quality information in the Internet. At the same time, the algorithm has been shown to be very resistant to spamming attacks. We based on the well-known information retrieval algorithm HITS by Prof. Jon Kleinberg (1999) which is used by search engines to rank Web pages. We modified SPEAR so that it fits to the characteristics of the Social Web / Web 2.0, and extended it with a second component that integrates the timeline of user activities into the analysis in order to further improve the performance of the algorithm (see figure 1).

The original use case – and the one described in our SIGIR paper – has been to find expert users and high quality websites for a given topic on the social bookmarking service Delicious.com, back in 2009 still a Yahoo! company. A great benefit of SPEAR is that it actually returns two very interesting results at the same time: first, a ranked list of expert users; and second, a ranked list of high quality Web documents. So whether you want to find experts for the programming language JavaScript or get the best websites for photography, SPEAR can help.

The two main elements of the SPEAR algorithm are:

  1. Mutual reinforcement of user expertise and document quality: A user’s expertise in a particular topic depends on the quality of the documents he/she has found, and the quality of documents in turn depends on the expertise of user who have found them.

  2. Discoverers vs. followers: Expert users should be discoverers – they tend to be faster than others to identify new and high quality documents. In other words, “the early bird catches the worm” (see also figure 1)

The combination of both elements has the effect that SPEAR favors quality over quantity of user activities and that it is quite resistant to today’s spamming attacks.

Figure 1: The SPEAR algorithm gives more credit to early discoverers of new information. How much credit each user receives depends on a so-called credit score function that is supplied as a parameter to the algorithm.

Figure 2: The main technical components of the actual SPEAR algorithm are a weighted adjacency matrix and two score vectors. The vectors keeps track of the expertise score of users and quality scores for documents, respectively.

How does everyman benefit?

Since the interaction of users in the Internet is steadily increasing, techniques like SPEAR are very helpful to improve the social networks of users; for example, by recommending interesting and trustful users or by mitigating information overload. The faster and more efficient important information can be shared, the better. On top of that, SPEAR is resistant to spamming attacks and helps to get rid of maliciously published junk content.

Where can SPEAR be used?

We and others think that the SPEAR algorithm can be applied to a rather wide area of domains. On the one hand, SPEAR can help to improve applications that directly focus on users – for example identifying trends or trendsetters on sites like Twitter – or on the information shared between such users. The latter includes buying behavior and customer reviews on online shops like Amazon or music listening preferences on services like Spotify. In both cases SPEAR can provide additional information that could help to improve the quality of these applications, e.g. to provide more accurate product recommendations on Amazon.

On the other hand, the SPEAR algorithm itself is not restricted to the online world. We imagine, for example, to use SPEAR for estimating the expertise of researchers by analyzing scientific publications. Such publications – whether available as online versions or printed out on paper – provide all the information we need to run SPEAR.

Download the Reference Implementation

For developers and researchers alike, we provide a reference implementation of SPEAR written in the Python programming language as free and open source software.

For starters, you want to read my quick introduction on how to use the SPEAR Python library.

Scientific Publications

  • C.-M. Au Yeung, M. G. Noll, N. Gibbins, C. Meinel, N. Shadbolt
    SPEAR: Spamming-resistant Expertise Analysis and Ranking in Collaborative Tagging Systems (ask me for a free copy)
    International Journal of Computational Intelligence, Wiley-Blackwell, Volume 27, Issue No. 3, 2011 (Impact Factor: 3.31)
  • M. G. Noll, C.-M. Au Yeung, N. Gibbins, C. Meinel, N. Shadbolt
    Telling Experts from Spammers: Expertise Ranking in Folksonomies
    Proceedings of 32nd ACM SIGIR Conference, Boston, USA, July 2009, pp. 612-619, ISBN 978-1-60558-483-6 (ACM Link, BibTeX)

Selected Press Coverage

Comments