Tutorials
Kafka
Apache Kafka is a distributed streaming platform.
I have written the following tutorials related to Kafka:
- What Every Software Engineer Should Know about Apache Kafka: Events, Streams, Tables, Storage, Processing, And More
- Of Streams and Tables in Kafka and Stream Processing, Part 1
- Integrating Kafka and Storm: Code Examples and State of the Game
- Integrating Kafka and Spark Streaming: Code Examples and State of the Game
- Running a Multi-Broker Apache Kafka 0.8 Cluster on a Single Node
Algebird
Twitter Algebird is a Scala library to work with algebraic data structures such as monoids and monads. It allows you to plug your own data structures directly into large-scale data processing tools such as Kafka and Hadoop, and it ships with a collection of common data structures such as Count-Min Sketches, Bloom filters, and HyperLogLog.
I have written the following tutorials related to Algebird:
Storm
Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
I have written the following tutorials related to Storm:
- Integrating Kafka and Storm: Code Examples and State of the Game
- Running a Multi-Node Storm Cluster
- Implementing Real-Time Trending Topics with a Distributed Rolling Count Algorithm in Storm
- Understanding the Parallelism of a Storm Topology
Spark
Apache Spark is a fast and general engine for large-scale data processing, similar to Apache Hadoop. It includes the Spark Streaming sub-project, which is a real-time data processing platform, similar to Apache Storm.
I have written the following tutorials related to Spark:
Hadoop
Apache Hadoop is a free and open source implementation of frameworks for reliable, scalable, distributed computing and data storage. It enables applications to work with thousands of nodes and petabytes of data, and as such is a great tool for research and business operations. Hadoop was inspired by Google’s MapReduce and Google File System papers.
I have written the following tutorials related to the Hadoop technology stack: