Michael G. Noll

Applied Research. Big Data. Distributed Systems. Open Source.

Tutorials

Hadoop

Apache Hadoop is a free and open source implementation of frameworks for reliable, scalable, distributed computing and data storage. It enables applications to work with thousands of nodes and petabytes of data, and as such is a great tool for research and business operations. Hadoop was inspired by Google’s MapReduce and Google File System papers.

I have written the following tutorials related to the Hadoop technology stack:

Storm

Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.

I have written the following tutorials related to Storm:

Kafka

Apache Kafka is a high-throughput distributed messaging system.

I have written the following tutorials related to Kafka:

Spark

Apache Spark is a fast and general engine for large-scale data processing, similar to Apache Hadoop. It includes the Spark Streaming sub-project, which is a real-time data processing platform, similar to Apache Storm.

I have written the following tutorials related to Spark:

Algebird

Twitter Algebird is a JVM library written in Scala to work with algebraic data structures such as monoids and monads. It allows you to plug your own data structures directly into large-scale data processing tools such as Hadoop and Storm, and ships with a collection of common data structures such as Bloom filters and HyperLogLog.

I have written the following tutorials related to Algebird:

Mozilla Firefox

If you are a Firefox add-on developer, the tutorials below might come in handy.