Michael G. Noll

Applied Research. Big Data. Distributed Systems. Open Source.

Delicious Python API

One of my research tasks required me to retrieve various information from Delicious.com, a well-known social bookmarking service. My programming language of choice is Python, and so I wrote a basic Python module for getting the data I needed.

Figure 1: A tag cloud as seen on Delicious.com

deliciousapi.py

Important Note: It is strongly advised that you read the Delicious.com Terms of Use document before using this Python module. In particular, read section 5 “Intellectual Property”.

Part of the functionality in DeliciousAPI is implemented by calling the official Delicious.com API or parsing its JSON feeds, other parts are provided by mining and scraping data directly from the Delicious.com website. The module is able to detect IP throttling, which is employed by Delicious.com to temporarily block abusive HTTP request behavior, and will raise a custom Python error to indicate that. Please be a nice netizen and do not stress the Delicious.com service more than necessary. I don’t, and you shouldn’t, too.

DeliciousAPI provides the following features plus some more:

  • get_urls(): retrieves the most recent URLs which have been bookmarked and annotated with a given tag; supports the retrieving links from the delicious hotlist (front page) plus /popular/<name_of_tag> and /tag/<name_of_tag>
  • get_url(): returns all public bookmarks of a URL, i.e. its “history”
  • get_user():
    • returns a user’s full bookmark collection including private bookmarks if you know username AND password; in this case, all communication with Delicious.com is encrypted via SSL
    • returns a user’s full public bookmark collection if you don’t know the user password (additional parameter: max_bookmarks; by default, only the 50 most recent bookmarks are retreived)
  • get_tags_of_user(): returns a user’s full tagging vocabulary, i.e. tags and tag counts, aggregated over all public bookmarks
  • HTTP proxy support

Here is a code snippet to demonstrate basic usage of deliciousapi.py:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import deliciousapi
dapi = deliciousapi.DeliciousAPI()
url = "http://www.michael-noll.com/wiki/Del.icio.us_Python_API"
username = "jsmith"

# web pages shown on the front page of Delicious.com aka the 'hotlist'
featured_links = dapi.get_urls()

# popular web pages tagged with "photography"
popular_photography_links = dapi.get_urls(tag="photography")

# web pages recently tagged with "web2.0", up to a maximum of
# 300 URLs if possible; note that get_urls() cannot guarantee
# that the list of URLs is free of duplicate items - this is
# due to the way Delicious.com generates the regular feeds for
# a given tag (i.e. /tag/&lt;tag&gt; as opposed to /popular/&lt;tag&gt;)
recent_web20_links = dapi.get_urls(tag="web2.0", popular=False, max_urls=300)

# DeliciousURL object, providing
#     .title : title of the web document as stored on delicious.com
#     .url   : URL of the corresponding web document
#     .total_bookmarks: total number of bookmarks/users for this url
#     .bookmarks  : list of (user, tags, comment, timestamp) tuples
#     .top_tags: list of (tag, tag_count) tuples, representing the
#                most popular tags of this url (up to 10)
#     .tags       : dict mapping tags to total tag count
#
#
# Note that by default, get_url() does only retrieve the
# 50 most recent bookmarks of a given url. You can control
# this behavior with the max_bookmarks parameter (see
# docstrings).
url_metadata = dapi.get_url(url)

print url_metadata
# output: [http://www.michael-noll.com/wiki/Del.icio.us_Python_API] 103 total bookmarks (= users), 187 tags (37 unique), 10 out of 10 max 'top' tags

# print url_metadata.title
# output: Del.icio.us Python API - Michael G. Noll

print url_metadata.bookmarks
# output: [
#  (u'neetij', [u'python', u'api', u'del.icio.us', u'programming'], None, datetime.datetime(2008, 8, 4, 0, 0)),
#  (u'jsf.online', [u'software', u'programming', u'free', u'development', u'del.icio.us', u'python', u'2008'], u'Python API - wraps the del.icio.us api for python', datetime.datetime(2008, 8, 4, 0, 0)),
#  (u'as11018', [u'python', u'api', u'programming'], None, datetime.datetime(2008, 7, 30, 0, 0)),
#  ...]
print url_metadata.top_tags
# output: [ (u'python', 91), (u'api', 73), (u'del.icio.us', 71), ... ]

print url_metadata.tags
# output : { u'is:api': 1, u'code': 6, u'toread': 1, ... }

# If get_user() is called with both username and password, the full
# bookmark collection of the user is returned, including any private
# bookmarks. Communication is encrypted via SSL. You can use get_user()
# for creating a backup of your Delicious.com bookmarks.
#
# If get_user() is called without password, only the most recent
# public bookmarks of the given user are returned (up to 100).
#
# DeliciousUser object, providing
#     .bookmarks  : list of (url, tags, title, notes, timestamp) tuples
#     .tags       : dict mapping tags to total tag count
#     .username   : name of the corresponding del.icio.us user
user_metadata = dapi.get_user(username)

print user_metadata
# output: [jsmith] 31 bookmarks, 78 tags (45 unique)

print user_metadata.bookmarks
# output: [ (u'http://www.twellow.com/', [u'mashup', u'tools', u'twitter'], u'Twellow.com :: Twitter users organized into business categories', u'Kind of yellow pages for Twitter, interesting.', datetime.datetime(2008, 6, 25, 0, 0, 0)), ... ]

# list of (tag, tag_count) tuples
user_tags = dapi.get_tags_of_user(username)
print user_tags
# output: { 'golf': 1, 'toread': 11, 'recipe': 1, 'rest': 4, ... }

Download

You can download and install DeliciousAPI from the Python Package Index via setuptools/easy_install. Just run

1
2
3
4
5
# initial installation
easy_install DeliciousAPI

# for updates
easy_install -U DeliciousAPI

and after installation, a simple import deliciousapi in your Python scripts will do the trick.

An alternative is to download the code straight from my code repository on GitHub:

License

The code is licensed to you under the GNU General Public License, version 2.

Feedback

Comments, questions and constructive feedback are always welcome. Just drop me a note.

Comments