Kafka

Apache Kafka is a distributed streaming platform.

I have written the following tutorials related to Kafka:

Algebird

Twitter Algebird is a Scala library to work with algebraic data structures such as monoids and monads. It allows you to plug your own data structures directly into large-scale data processing tools such as Kafka and Hadoop, and it ships with a collection of common data structures such as Count-Min Sketches, Bloom filters, and HyperLogLog.

I have written the following tutorials related to Algebird:

Of Algebirds, Monoids, Monads, and Other Bestiary for Large-Scale Data Analytics

Storm

Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.

I have written the following tutorials related to Storm:

Spark

Apache Spark is a fast and general engine for large-scale data processing, similar to Apache Hadoop. It includes the Spark Streaming sub-project, which is a real-time data processing platform, similar to Apache Storm.

I have written the following tutorials related to Spark:

Integrating Kafka and Spark Streaming: Code Examples and State of the Game

Hadoop

Apache Hadoop is a free and open source implementation of frameworks for reliable, scalable, distributed computing and data storage. It enables applications to work with thousands of nodes and petabytes of data, and as such is a great tool for research and business operations. Hadoop was inspired by Google’s MapReduce and Google File System papers.

I have written the following tutorials related to the Hadoop technology stack: