Blog
You can subscribe to my blog via RSS.
2021
2020
2018
2014
- Oct 01 - Integrating Kafka and Spark Streaming: Code Examples and State of the Game
- Sep 15 - Apache Storm 0.9 training deck and tutorial
- Aug 18 - Apache Kafka 0.8 training deck and tutorial
- May 27 - Integrating Kafka and Storm: Code Examples and State of the Game
- Mar 17 - Wirbelsturm: 1-Click Deployments of Storm and Kafka clusters with Vagrant and Puppet
2013
- Dec 02 - Of Algebirds, Monoids, Monads, and other Bestiary for Large-Scale Data Analytics
- Nov 06 - Sending Metrics from Storm to Graphite
- Sep 17 - Replephant: Analyzing Hadoop Cluster Usage with Clojure
- Jul 04 - Using Avro in MapReduce jobs with Hadoop, Pig, Hive
- Jun 21 - Understanding the Internal Message Buffers of Storm
- Jun 06 - Installing and Running Graphite via RPM and Supervisord
- May 28 - Multi-Node Storm Cluster Tutorial Published
- Mar 17 - Reading and Writing Avro Files from the Command Line
- Mar 13 - Running a Multi-Broker Apache Kafka 0.8 Cluster on a Single Node
- Jan 25 - Bootstrapping a Java project with Gradle, TestNG, Mockito and Cobertura for Eclipse and Jenkins
- Jan 18 - Implementing Real-Time Trending Topics with a Distributed Rolling Count Algorithm in Storm
2012
2011
- Oct 20 - Understanding HDFS quotas and Hadoop fs and fsck tools
- Aug 23 - Performing an HDFS Upgrade of an Hadoop Cluster
- Apr 14 - Building an Hadoop 0.20.x version for HBase 0.90.2
- Apr 09 - Benchmarking and Stress Testing an Hadoop Cluster with TeraSort, TestDFSIO & Co.
- Mar 28 - Hadoop space quotas, HDFS block size, replication and small files
2010
- Nov 29 - Virtualenv Cheat Sheet
- Jul 10 - Reference implementation of SPEAR algorithm released
- Jan 20 - How To Extract Audio From FLV Files Using VLC
2009
- Sep 03 - Invited article for Yahoo! on SPEAR algorithm
- Aug 25 - German characters in VMware Fusion on Mac OS X
- Jul 31 - Technology Review article on our expertise ranking approach from SIGIR '09
- Jun 05 - Telling Experts from Spammers: Expertise Ranking in Folksonomies
- Mar 13 - Article published in Python Magazine
2008
- Dec 02 - CABS120k08: Data Corpus for Research in the Web 2.0, November 2008
- Sep 17 - Building a Scalable Collaborative Web Filter with Free and Open Source Software
- Sep 05 - The Metadata Triumvirate: Social Annotations, Anchor Texts and Search Queries
2007
- Oct 25 - Exploring Social Annotations for Web Document Classification
- Jul 18 - Personalization 2.0: Web Search Personalization via Social Bookmarking and Tagging
- Jun 12 - DMOZ100k06: Data Corpus for Research in the Web 2.0
- May 07 - Authors vs. Readers: A Comparative Study of Document Metadata and Content in the WWW
- Jan 04 - Word movement shortcuts for iTerm on Mac OS X