Del.icio.us Python API

From Michael G. Noll

Jump to: navigation, search

One of my recent research tasks required me to retrieve various information from del.icio.us, a well-known social bookmarking service. My programming language of choice is Python, and so I wrote a basic Python module for getting the data I needed.

News: As of August 01, 2008, del.icio.us has relaunched its web service. Due to a lot of changes behind the scenes, all users of my Python API have to update the module (via easy_install -U deliciousapi). I also added some new features, so feel free to read again the documentation below. Thanks to Dave from Yahoo/delicious for sending me a note about these changes on delicious' side!

A tag cloud as seen on del.icio.us.
Figure 1: A tag cloud as seen on del.icio.us.

Contents

deliciousapi.py

IMPORTANT NOTE: It is strongly advised that you read the del.icio.us Terms of Use document before using this Python module. In particular, read section 5 "Intellectual Property".

Part of the functionality in DeliciousAPI is implemented by calling the official del.icio.us API or parsing its JSON feeds, other parts are provided by mining and scraping data directly from the del.icio.us website. The module is able to detect IP throttling, which is employed by del.icio.us to temporarily block abusive HTTP request behavior, and will raise a custom Python error to indicate that. Please be a nice netizen and do not stress the del.icio.us service more than necessary. I don’t, and you shouldn’t, too.

DeliciousAPI provides the following features plus some more:

  • get_url(): returns all public bookmarks of a URL, i.e. its "history"
  • get_user():
    • returns a user's full bookmark collection including private bookmarks if you know username AND password; in this case, all communication with del.icio.us is encrypted via SSL
    • returns a user's most recent public bookmarks (up to 100) if you don't know the password
  • get_tags_of_user(): returns a user's full tagging vocabulary, i.e. tags and tag counts, aggregated over all public bookmarks
  • HTTP proxy support

Please note that DeliciousAPI can currently not scrape a user's full public bookmark collection if you don't know the user's password. This is because of technical reasons on del.icio.us' side.

Here is a code snippet to demonstrate basic usage of deliciousapi.py:

import deliciousapi
dapi = deliciousapi.DeliciousAPI()
url = "http://www.michael-noll.com/wiki/Del.icio.us_Python_API"
username = "jsmith"
 
# DeliciousURL object, providing
#     .title : title of the web document as stored on delicious.com
#     .url   : URL of the corresponding web document
#     .total_bookmarks: total number of bookmarks/users for this url
#     .bookmarks  : list of (user, tags, comment, timestamp) tuples
#     .top_tags: list of (tag, tag_count) tuples, representing the
#                most popular tags of this url (up to 10)
#     .tags       : dict mapping tags to total tag count
#
#
# Note that by default, get_url() does only retrieve the
# 50 most recent bookmarks of a given url. You can control
# this behavior with the max_bookmarks parameter (see
# docstrings).
url_metadata = dapi.get_url(url)
 
print url_metadata
# output: [http://www.michael-noll.com/wiki/Del.icio.us_Python_API] 103 total bookmarks (= users), 187 tags (37 unique), 10 out of 10 max 'top' tags
 
# print url_metadata.title
# output: Del.icio.us Python API - Michael G. Noll
 
print url_metadata.bookmarks
# output: [
#  (u'neetij', [u'python', u'api', u'del.icio.us', u'programming'], None, datetime.datetime(2008, 8, 4, 0, 0)),
#  (u'jsf.online', [u'software', u'programming', u'free', u'development', u'del.icio.us', u'python', u'2008'], u'Python API - wraps the del.icio.us api for python', datetime.datetime(2008, 8, 4, 0, 0)),
#  (u'as11018', [u'python', u'api', u'programming'], None, datetime.datetime(2008, 7, 30, 0, 0)), 
#  ...]
print url_metadata.top_tags
# output: [ (u'python', 91), (u'api', 73), (u'del.icio.us', 71), ... ]
 
print url_metadata.tags
# output : { u'is:api': 1, u'code': 6, u'toread': 1, ... }
 
 
# If get_user() is called with both username and password, the full
# bookmark collection of the user is returned, including any private
# bookmarks. Communication is encrypted via SSL. You can use get_user()
# for creating a backup of your del.icio.us bookmarks.
#
# If get_user() is called without password, only the most recent
# public bookmarks of the given user are returned (up to 100).
#
# DeliciousUser object, providing
#     .bookmarks  : list of (url, tags, title, notes, timestamp) tuples
#     .tags       : dict mapping tags to total tag count
#     .username   : name of the corresponding del.icio.us user
user_metadata = dapi.get_user(username)
 
print user_metadata
# output: [jsmith] 31 bookmarks, 78 tags (45 unique)
 
print user_metadata.bookmarks
# output: [ (u'http://www.twellow.com/', [u'mashup', u'tools', u'twitter'], u'Twellow.com :: Twitter users organized into business categories', u'Kind of yellow pages for Twitter, interesting.', datetime.datetime(2008, 6, 25, 0, 0, 0)), ... ]
 
# list of (tag, tag_count) tuples
user_tags = dapi.get_tags_of_user(username)
print user_tags
# output: { 'golf': 1, 'toread': 11, 'recipe': 1, 'rest': 4, ... }

deliciousmonitor.py

I have also written a Python script for monitoring del.icio.us bookmark RSS feeds. The default RSS feed is the "hotlist" of urls you see on the del.icio.us frontpage.

This script uses my delicious Python API and demonstrates how it can be used. Basically, it mirrors the RSS feed and retrieves additional metadata such as an entry’s most popular tags from the del.icio.us service itself.

Here is an example output:

<document url="http://www.michael-noll.com/wiki/Del.icio.us_Python_API" users="103" top_tags="10">
    <top_tag name="python" count="91" />
    <top_tag name="api" count="73" />
    <top_tag name="del.icio.us" count="71" />
    <top_tag name="delicious" count="32" />
    <top_tag name="programming" count="29" />
    ...
</document>

Download

You can now download and install the del.icio.us API from Python Cheese Shop (includes only deliciousapi.py) via setuptools/easy_install. Just run

  • easy_install DeliciousAPI, or
  • easy_install -U DeliciousAPI for updates

and after installation, a simple import deliciousapi in your Python scripts will do the trick.

An alternative is to download the code straight from my Subversion repository.

The code has been tested with Python 2.4.3 and 2.5.

License

The code is licensed to you under the GNU General Public License, version 2.

Feedback

Comments, questions and constructive feedback are always welcome. Just drop me a note.

Tags: acm, content rating, corpus, del.icio.us, delicious, dmoz, dmoz100k06, doceng, google, icra, metadata, pagerank, paper, papers, publication, publications, random sample, research, social bookmarking, social tagging, study, tagging, web2.0