Tutorials
Kafka
Apache Kafka is a distributed streaming platform.
I have written the following tutorials related to Kafka:
- Of Streams and Tables in Kafka and Stream Processing, Part 1
- Apache Kafka 0.8 Training Deck and Tutorial – 120 slides that cover Kafka’s core concepts, operating Kafka in production, and developing Kafka applications
- Integrating Kafka and Storm: Code Examples and State of the Game
- Integrating Kafka and Spark Streaming: Code Examples and State of the Game
- Running a Multi-Broker Apache Kafka 0.8 Cluster on a Single Node
Algebird
Twitter Algebird is a Scala library to work with algebraic data structures such as monoids and monads. It allows you to plug your own data structures directly into large-scale data processing tools such as Kafka and Hadoop, and it ships with a collection of common data structures such as Count-Min Sketches, Bloom filters, and HyperLogLog.
I have written the following tutorials related to Algebird:
Storm
Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
I have written the following tutorials related to Storm:
- Apache Storm 0.9 Training Deck and Tutorial – 130 slides that cover Storm’s core concepts, operating Storm in production, and developing Storm applications
- Integrating Kafka and Storm: Code Examples and State of the Game
- Running a Multi-Node Storm Cluster
- Implementing Real-Time Trending Topics with a Distributed Rolling Count Algorithm in Storm
- Understanding the Parallelism of a Storm Topology
Spark
Apache Spark is a fast and general engine for large-scale data processing, similar to Apache Hadoop. It includes the Spark Streaming sub-project, which is a real-time data processing platform, similar to Apache Storm.
I have written the following tutorials related to Spark:
Hadoop
Apache Hadoop is a free and open source implementation of frameworks for reliable, scalable, distributed computing and data storage. It enables applications to work with thousands of nodes and petabytes of data, and as such is a great tool for research and business operations. Hadoop was inspired by Google’s MapReduce and Google File System papers.
I have written the following tutorials related to the Hadoop technology stack:
- Running Hadoop On Ubuntu Linux (Single-Node Cluster)
- Running Hadoop On Ubuntu Linux (Multi-Node Cluster)
- Writing An Hadoop MapReduce Program In Python
- Using Avro in MapReduce Jobs With Hadoop, Pig, Hive
Mozilla Firefox
If you are a Firefox add-on developer, the tutorials below might come in handy.
- Cookie Monster for XMLHttpRequest
How to strip cookies from XMLHttpRequests in Mozilla Firefox, using JavaScript and XUL.