Apache Hadoop is a free and open source implementation of frameworks for reliable, scalable, distributed computing and data storage. It enables applications to work with thousands of nodes and petabytes of data, and as such is a great tool for research and business operations. Hadoop was inspired by Google’s MapReduce and Google File System papers.
I have written the following tutorials related to the Hadoop technology stack:
- Running Hadoop On Ubuntu Linux (Single-Node Cluster)
- Running Hadoop On Ubuntu Linux (Multi-Node Cluster)
- Writing An Hadoop MapReduce Program In Python
- Using Avro in MapReduce Jobs With Hadoop, Pig, Hive
Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
I have written the following tutorials related to Storm:
- Running a Multi-Node Storm Cluster
- Implementing Real-Time Trending Topics with a Distributed Rolling Count Algorithm in Storm
- Understanding the Parallelism of a Storm Topology
Apache Kafka is a high-throughput distributed messaging system.
I have written the following tutorials related to Kafka:
Twitter Algebird is a JVM library written in Scala to work with algebraic data structures such as monoids and monads. It allows you to plug your own data structures directly into large-scale data processing tools such as Hadoop and Storm, and ships with a collection of common data structures such as Bloom filters and HyperLogLog.
I have written the following tutorials related to Algebird:
If you are a Firefox add-on developer, the tutorials below might come in handy.
- Cookie Monster for XMLHttpRequest