Have you ever asked yourself what monoids and monads are, and particularly why they seem to be so attractive in the field of largescale data processing? Twitter recently opensourced Algebird, which provides you with a JVM library to work with such algebraic data structures. Algebird is already being used in Big Data tools such as Scalding and SummingBird, which means you can use Algebird as a mechanism to plug your own data structures – e.g. Bloom filters, HyperLogLog – directly into largescale data processing platforms such as Hadoop and Storm. In this post I will show you how to get started with Algebird, introduce you to monoids and monads, and address the question why you should get interested in those in the first place.
Goal of this article
The main goal of this is article is to spark your curiosity and motivation for Algebird and the concepts of monoid, monads, and category theory in general. In other words, I want to address the questions “What’s the big deal? Why should I care? And how can these theoretical concepts help me in my daily work?”
While I will explain a little bit what the various concepts such as monoids are, this is not the focus of this post. If in doubt I will rather err on the side of grossly oversimplifying a topic to get the point across even at the expense of correctness. There are much better resources available online and offline that can teach you the full details of the various items I will discuss here. That being said, I compiled a list of references at the end of this article so that you have a starting point to understand the following concepts in full detail, and with more accurate and thorough explanations than I could come up with.
Motivating example
A first look at Algebird
Here is a simple example what you can do with monoids and monads, based on the starter example in Algebird.
1 2 3 4 5 6 

What is happening here? Basically, we are boxing two numbers, the Int
values 4
and 5
, into Max
, and then
we are “adding” them. The behavior of Max[T]
turns the +
operator into a function that returns the largest boxed
T
.
Conceptually this is similar to the following native Scala code:
1 2 3 4 5 6 7 

At this point you may ask, “Alright, what is the big deal? The native Scala example looks actually better!”
At least, that is what I thought myself at first. But the simplicity of this example is deceptive. There is a lot more to it than meets the eye at first sight.
Beyond trivial examples
Admittedly, the first example used a very dull data structure, Int
. Any programming language comes with builtin
functionality to add two integers, right? So you would be hardly convinced of the value of a tool like Algebird if all
it allowed you to do was 4 + 3 = 7
, particularly when doing those simple things would require you to understand
sophisticated concepts such as monoids and monads. Too much effort for too little value I would say!
So let me use a different example because adding Int
values is indeed trivial. Imagine that you are working on
largescale data analytics that make heavy use of Bloom filters. Your
applications are based on highlyparallel tools such as Hadoop or Storm, and they create and work with many such Bloom
filters in parallel. Now the money question is: How do you combine or add two Bloom filters in an easy way?
(This is where monoids come into play.)
1 2 3 

And what about performing other operations on those Bloom filter instances, notably data processing pipelines based on
common functions such as map
, flatMap
, foldLeft
, reduceLeft
? (And this is where monads come to into play.)
1 2 

And what about combining two HyperLogLog instances?
Intuitively we could say that the general idea of “adding” two Bloom filters is quite similar to how we would add two sets A and B, where adding would mean creating the union set of A and B.
Now Algebird addresses this problem of abstraction. In a nutshell, if you can turn a data structure into a monoid
(or semigroup, or …), then Algebird allows you to put it to good use. You can then work with your data structure
just as nicely as you are so used to when dealing with Int
, Double
or List
. And you can use it with largescale
data processing tools such as Hadoop and Storm, too.
Wait a minute!
In case you are asking yourself the following question (which I did): Is the magic of Algebird simply something like a
custom Max[Int]
class that defines a +()
method, similar to the following snippet but actually with a bounded
type parameter T : Ordering[T]
? (If you do not understand the latter, take a look at
this StackOverflow thread.)
1 2 3 4 5 6 

The answer is yes and no. “Yes” because it is similar. And “no” because the implementation is quite different from the above analogy, and provides you with significantly more algebrafu (but again, it has the same spirit).
What we want to do
Our goal in this post is to build a data structure TwitterUser
accompanied by a Max[TwitterUser]
monoid view of it.
We want to use the two for implementing the analytics of a fictional popularity contest on Twitter, like so:
1 2 3 4 5 6 7 8 9 

Figuring out how to do this with monoids, monads, and Algebird is the objective of this article.
Of course, instead of using Algebird and monoids we could also project the numberoffollowers field from each user
and perform any such analytics directly on the Int
values. That’s not the point however. I intentionally wanted a
very simple example use case because, as you will see, there is so much to understand about what’s going on behind the
scenes that any further distraction should be avoided. At least, that was my personal experience. :)
My journey down the rabbit hole
This section is more for entertainment. Feel free to skip it.
How this post started
I am following a few Twitter folks on, well, Twitter such as Dmitriy Ryaboy (@squarecog) and Oscar Boykin (@posco). And lately they talked a lot about how data analytics at Twitter is powered by “monoids” and “monads”, and how tools such as Algebird and Scalding form the code foundation of their analytics infrastructure.
Here is an example of such a conversation:
(Link to full image “Monads, Monads Everywhere!”)
A mowhat? And how comes those things are apparently spreading like a contagious disease throughout their data analytics code?
Another trigger was a discussion involing Ted Dunning of MapR (@ted_dunning) and his work on a new data structure called tdigest:
Why was Ted being asked whether tdigest
is associative? And how does all this relate to semigroups and monoids? And
finally, what the heck are semigroups in the first place?
Now a dangerous series of events began to take place on my side.
First I thought, “Hey, coincidentally I have started to pick up Scala around a month ago. Given that Algebird is written in Scala this might turn into an interesting finger exercise.” (Note my focus on “finger exercise”.) On top of that I knew that the use of Algebird extends to other interesting big data tools such as Storm and Scalding, so it could turn out that I would not only learn something for learning’s sake but that I could put it to practical use in my daily work, too. The combination of these two factors – general interest and practical applicability – eventually caused me to give in to my curiosity and decided to put “an hour or two aside” to read up on those monoid thingies and figure out whether and how I could leverage Algebird for my own purposes.
You might notice at this point that it all started quite innocently. But what I did not realize at that moment was that I was opening Pandora’s box on an otherwise quiet and peaceful Swiss weekend…
Scala, functors, monoids, monads, category theory, implicits, type classes, aaargh!
What started as a seemingly innocent journey down a calm park lane quickly turned into the opening of the gates of functional programming and category theory hell. Not only did I struggle to understand what things like functors, semigroups, monoids, and other algebraic structures that only a mother could love are. No, on top of that I quickly realized that how these things can be implemented in Scala in general and in Algebird in particular meant I had to take my beginner Scalafu to a whole new level. In the end it took me the full weekend to grasp all those concepts to the point where I’d say right now that I know enough to be dangerous.
The learning curve reminded me a lot of the following famous picture:
And it did feel like the vi
curve – the brick wall experience. What else could it be, right? That being said I
still fear that, after having hit and finally made it over that initial brick wall, it may still spiral out of control
again like the Emacs curve. :)
Picture myself sitting in front of my keyboard, frantically interacting with your favorite search engine, StackOverflow, Wikipedia, your usual suspects of Scala books, and what not:
Me: “What is a monad?”
Internet: “A monad is just a monoid in the category of endofunctors.”
Me: “Hmm, ok. So what is a monoid?”
Internet: “A monoid is a semigroup with identity.”
Me: “Then what is a semigroup??” (number of question marks increases with anxiety level)
Internet: “An algebraic structure consisting of a set together with an associative binary operation.”
Me: “Alright, I see the mathematical definition and I do see a soup of greek letters. Still, what is it? Where can I get one from, and what can I use it for?”
Internet: “Here is an example in the Haskell programming language.”
Me: <censored>
On a more serious note, the past few days have really been a tour de force where I felt I would recursively dive from one new term or concept into yet more new terms and concepts, to the point where my brain would run into a stack overflow. “Why am I actually reading about magmas, or co and contravariance in Scala, or bounded type parameters? What was the original question I tried to find an answer for?”
To make a long story short I was really deep down the rabbit hole, with no Alice in sight but fully surrounded by semigroups of monoidal and diabolical jabberwockies on a big night out. Given the questions, comments and blog posts of other folks at least I found consolation in the fact that I was apparently not alone.
And, finally, at the end of the hole there was a bit of light. In the next sections I want to share what I have learned so far in the hope that it will prove helpful for you, too. We start with a brief introduction to monoids and monads, followed by how to apply what we have learned in Algebird handson.
The TL;DR version of monoids and monads
A monad is a monoid where you blend the “oi” into an “a”. Depending on your typesettings (pun intended) this blend will be easier or harder for you to see. If in doubt, squint more.
As a grossly simplified rule of thumb:
 Monoid: If you want to “attach” operations such as
+
,
,*
,/
or<=
to data objects – say, adding two Bloom filters – then you want to provide monoid forms for those data objects (e.g. a monoid for your Bloom filter data structure). This way you can combine and juggle your custom data structures just like you would do with plain integer numbers.  Monad: If you want to create data processing pipelines that turn data objects stepbystep into the desired, final output (e.g. aggregating raw records into summary statistics), then you want to build one or more monads to model these data pipelines. Particularly if you want to run those pipelines in largescala data processing platforms such as Hadoop or Storm.
The intent of this section is to give you a highlevel idea what those concepts are, and what you can use them for. That is, this section should help you determine whether you want to venture down the rabbit hole, too.
I did not want to add yet another variant to the pool of “what is a monoid/monad” articles, but at the same time I felt I need to explain at least very briefly what the various concepts are (as good as I can) so that you can better understand how to use a tool such as Algebird.
Of course, if you ran across a blatant mistake on my side please do let me know!
Monoids
What is a monoid?
A monoid is a structure that consists of:
 a set of objects (such as numbers)
 a binary operation as a method of combining them (such as adding those numbers)
The small catch is that the way you can combine the objects in your set must adhere to a few rules, which are described in the next section.
One way to explain a monoid in the context of programming is as a kind of adapter or bounded view of a type T
.
Imagine a data structure of type T
– say, a List
. If you can find a way to use T
in a way that conforms to the
monoid laws (see next section), then you can say “type T forms a monoid” Monoid[T]
; for instance, if the binary
operation you picked behaves like the concept of addition, you have an additive monoid view of T
.
So you can read Monoid[T]
as “T looks like a monoid and quacks like a monoid, so it must be a monoid”. This notion
is related to the concept of duck typing in languages such as Python.
Scala, in which Algebird is implemented, has a static type system though, and to achieve such adhoc polymorphism we
typically use
type classes
to achieve a similar effect. A nifty feature of type classes is that they allow you to retroactively add polymorphism
even to existing types that are not under your own control: examples are Seq
or List
, which are provided by the
Scala standard library and thus not under your control.
Monoids in more detail
A monoid is a set of objects, T
, together with a binary operation ⋅ that satisfies the three axioms
listed below.
One way to express a monoid in Scala would be the following trait, used as a type class:
1 2 3 4 5 6 

 Closure
 For all a, b in T, the result of the operation a ⋅ b is also in T: $$ \forall a,b \in T: a \bullet b \in T $$ In Scala, we could express this axiom with the following function signature for ⋅: def op(a: T, b: T): T
 Associativity
 For all a, b, and c in T, the equation (a ⋅ b) ⋅ c = a ⋅ (b ⋅ c) holds: $$ \forall a,b,c \in T: (a \bullet b) \bullet c = a \bullet (b \bullet c) $$ In Scala, we could express this axiom with: (a op b) op c == a op (b op c)
 Identity element
 There exists an element e (we could also call it zero to draw a link to addition) in T, such that for all elements a in T, the equation e ⋅ a = a ⋅ e = a holds: $$ \exists e \in T: \forall a \in T: e \bullet a = a \bullet e = a $$ In Scala, we could express this axiom with the following, which as you might note captures the idea of a noop: e op a == a op e == a
Before we move on and look at examples of monads, I want to mention one more thing about the binary function of a monoid. We have learned that it must be associative. Wouldn’t it be helpful if the binary function were commutative, too, even though this optional feature would not be required to make a monoid?
Here is a transcripted reply during Sam Ritchie’s SummingBird talk at CUFP:
Question: Associativity is one nice thing about monoids, but what about commutativity [which] is also important. Are there examples of noncommutative datastructures
Answer: It should be baked into the algebra (noncommutativity). This helps with data skew in particular. An important noncommutative application is Twitter itself! When you want to build the List monoid, the key is userid,time and the value is the list of tweets over that timeline (so ordering matters here). It’s not good to get a nondeterministic order when building up these lists in parallel, so that’s a good example of when associativity and commutativity are both important.
What are example monoids?
 Numbers (= the set of objects) you can add (= the method of combining them).
 For integer addition,
e == 0
andop == +
.  For integer multiplication,
e == 1
andop == *
.
 For integer addition,
 Lists you can concatenate.
 With
e == Nil
andop == concat
.
 With
 Sets you can union.
 With
e == Set()
andop == union
.
 With
There are more and also more sophisticated examples, of course. Max[Int]
at the beginning of this article is a
monoid, too.
Here is how Algebird defines an additive monoid for the standard type Seq
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 

Algebird actually includes a few more methods for the Monoid[T]
type class – which SeqMonoid[T]
extends – but the
key functionality is shown above.
What can I use a monoid for? Why should I look for one?
Whenever you have a data structure (which backs your “set of objects”, e.g. the Int
data structure or the List[T]
data structure) you can begin checking whether you can define one or more monoids for that data structure. Here you
will start looking for operations you can perform on any two instances of your data structure that satisfy the three
monoid axioms: closure, associativity, and identity element (the latter
gives your monoid a noop function, and is the one thing that turns a semigroup into a monoid).
If you do find any such monoids for your data structure, hooray! On the practical side this means means that you can now use your data structure in any code that expects a monoid. As I said above you can think of a monoid as an adapter, or shape, for (some monoidcompatible aspects of) your data structure that allows you to fit your data structure peg into a monoid hole. Some such holes are Algebird, Scalding and Summingbird of Twitter. Being supported by those tools also means that you can now plug your data structure into big data analytics tools such as Hadoop and Storm, which might be a huge seller and productivity gain for your new data structure.
Secondly, and in more general terms, it signifies because of the associativity of monoid operations that those
operations on your data structure can be parallelized
in order to utilize multiple CPU cores efficiently. Speaking in code, that means you can run operations such as
foldLeft()
and reduceLeft()
on them. And parallelization support is yet another reason why monoids (and monads) are
so attractive for big data tools such as Hadoop and Storm, where your code not only runs on many cores per machine but
on many such machines in a cluster. In other words:
If your data structure has a monoid form, this means you can plug the data structure directly into largescale data
processing platforms such as Hadoop and Storm.
Hence monoids enable you to MapReduce and to
divide and conquer.
Let me quote Sam Ritchie (@sritchie), former Twitter engineer and now founder of PaddleGuru (cool idea, by the way – go sports!) for a very concrete practical application of monoids at Twitter. Well, actually I am quoting a transcript of his talk.
One cool feature: When you visit a tweet, you want the reverse feed of things that have embedded the tweet. The MapReduce graph for this comes from: When you see an impression, find the key of the tweet and emit a tuple of the tweetId and Map[URL, Long]. Since Maps have a monoid, this can be run in parallel, and it will contain a list of who has viewed it and from where. The Map has a Long since popular tweets can be embedded in millions of websites and so they use a “CountMinSketch” [Note: Reader Sam Bessalah points out that the transcript is wrong when it said “accountment sketch”.] which is an approximate data structure to deal with scale there. The Summingbird layer which the speaker [Sam Ritchie] shows on stage filters events, and generates keyvalue pairs and emits events.
Twitter advertising is also built on Summingbird. Various campaigns can be built by building a backend using a monoid that expresses the needs, and then the majority of the work is on the UI work in the frontend (where it should be — remember, solve systems problems once is part of the vision).
See his CUFP slides on Summingbird for further detail.
Thirdly, you can compose monoids. For instance, you can form the product of two monoids M1
and M2
, which is the
tuple type (M1, M2)
. This product is also a monoid.
Lastly, you can now combine your monoidal data structure with monads (see below) and benefit from all the features that those monads provide.
Monads
What is a monad?
A monad is a structure that defines a way to combine functions. It represents computations defined as a sequence
of transformations that turn an original input into a final output, one step at a time. Think of them like function
chaining similar to y = h(g(f(x)))
.
An interesting aspect is that in the case of a monad the type of the value being piped through the function chain may
change along the way. For instance, you may start with an Int
but end up with a Double
or BloomFilter
. This is
different from a monoid, which will always retain the original type because of the closure requirement (see monoid
laws above).
One of the best analogies for monads I found is the following, adapted from Wikipedia: You can compare monads to physical assembly lines, where a conveyor belt (the monad) transports a piece of input material (the data) between functional units (functions on the data) that transform the piece one step at a time. Think of the skeleton of a car that is turned into the final car in a sequence of steps. Or of web server log files with raw data that is turned into business information such as the increase of ad impressions in the EMEA market for this month.
Sticking with this analogy, a monad enables you to decorate each processing step in the assembly pipeline with additional context (or an “environment”). For instance, your monad could carry state information that is used by the functions in the pipeline – this would be the example of a state monad. Alternatively, your monad could log what is going on before, within, or after a function to a file or database – this would be the example of an I/O monad. If you are a game developer, you could use a monad to carry the representation and state of the game environment (such as the current level), and the functions in the pipeline would model how players can interact with the environment.
Before we look at monads in more detail, let us take a brief detour to Storm.
When you are implementing bolts in Storm – i.e. Storm’s version of the “functional units” in a data processing
pipeline – you will come across the prepare()
and execute()
methods
(see the Storm tutorial):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 

Note how Storm provides environmental information and context to the bolt. This is one example where you could point
your finger at the code and say, “This would be a good place to use a monad.” In this specific case I would say it
would be primarily a kind of I/Omonad because the collector
instance allows the bolt write its output to
downstream bolts via network communication.
Monads in more detail
Here is one way to capture the concept of a monad in Scala. It is basically the same as the definition of a monad in Algebird.
1 2 3 4 5 6 7 8 9 

Alright, what is going on here?
apply()
boxes a T
value into the monad M[T]
. For example, T
is an Int
, the monad M[T]
is a List[T]
.
In other words, it is a goodol’ constructor for the monad.
flatMap()
turns a T
into a potentially different type parameter U
(but it can also be a T
again) that is boxed
into the same type of monad M
, i.e. M[U]
. In plain English, this means that if you have List monad all it will
ever produce for you is another List monad, but the type of elements in the List monad may change. The way this
happens is controlled by the second parameter of flatMap()
, which is a function from T
to M[U]
.
For example, T
is an Int
, U
is a Double
, and M
is a List
monad;
fn
is (i: Int) => List(i.toDouble / 4, i.toDouble / 2)
, i.e. T > M[U]
.
If you ran this combination over the input List[Int](1, 2)
, you would get the output:
List[Double](0.25, 0.5, 0.5, 1.0)
.
Similar to the monoid laws we discussed above, monads have their own laws – and these rules are actually very similar to its monoid brethren! I decided not to discuss monad laws in this post because I feel it is already very long. I may update the post at a later point though. In the meantime take a look at the following references:
 Monad laws (in Haskell). Remember
return
in Haskell means our constructorapply()
in Scala, and>>=
in Haskell is ourflatMap()
.  Monads are elephants, a series of blog posts by James Iry. In Scala.
I hope you will notice their similarities:
 The identity rules of monads are similar to the identity element
e
of monoids.  Both monoids and monads have functions that must be associative.
What are example monads?
At the beginning of the section on monads I already mentioned the statemonad and the I/Omonad.
Well, this may still be a bit vague. Let us look at a more concrete (and maybe simpler) example. Any collection type
is typically a monad. For example, take List[T]
:
 The constructor of
List[T]
acts asunit
as it gives you aList[T]
box forT
instances. List
has an appropriateflatMap()
function – andmap()
, which can be built fromflatMap()
and the constructor.
Here is the implementation of Monad[List]
in Algebird:
1 2 3 4 

Here you can see that Monad[List]
is simply a 1:1 adapter for the existing apply()
and flatMap()
functions of
List
. And that’s because List
in Scala already ships with monad “look and feel”.
Before we move on to the next section there is one more interesting facet: A monad can have monoid forms, too. Algebird, for instance, provides a default monoid view for its semigroups and monads:
1 2 3 4 5 6 7 8 9 10 11 12 

Groups, rings, and fields do not have such a default, “automatic” monoid view however. For those algebraic structures you must check yourself that the group/ring/field laws hold for your monad.
What can I use a monad for? Why should I look for one?
As we have already seen monads can be thought of as composable computation descriptions. This means you can use them to build powerful data processing pipelines. And these pipelines are not only powerful in terms of features and functionality, they can also be parallelized, which is one of the reasons why monads are so attractive in the field of largescale data processing where your code is run on many cores and on many machines at the same time.
Algebird
Finally we are getting close to being productive with Algebird. I figure the previous TL;DR section on monoids and monads was still maybe a bit too long. :)
If you recall, our original goal at the beginning of this post was to build a data structure TwitterUser
accompanied
with a Max[TwitterUser]
monoid view of it, using Algebird. We wanted to use the two for implementing the analytics of
a simple popularity contest on Twitter:
1 2 3 4 5 6 7 8 9 

Let’s start!
Creating a monoid
The TwitterUser type
Our first step is to create the data structure TwitterUser
for which we will then create a monoid view.
Because we want to build a Max
monoid for TwitterUser
eventually, we must come up with a way to order
TwitterUser
values. For this we can either use the
Ordering or the
Ordered trait in Scala, either way will work.
Let’s say we go down the Ordered
route. Now we must answer a design question: Do we consider the “ordering”
behavior to be a defining feature of TwitterUser
in general, or do we need this behavior only for its
Max[TwitterUser]
monoid view?
If it’s a general feature we would add it to TwitterUser
directly. If it’s only needed for the monoid we can also
decide to add it only there. In our case, we will add the ordering behavior to TwitterUser
directly. I will show
further down below how to implement the other option.
1 2 3 4 5 6 7 8 9 

The code above means that TwitterUser
supports comparison operations like >=
as defined by the compare
method of
the Ordered
trait.
1 2 

In our case this compare()
method is also used as the monoidal binary function of the Max[TwitterUser]
monoid we
will build in the next section. This works because compare()
satisfies all the three axioms described in our section
on monoids above.
The Max[TwitterUser] monoid
Creating the Max
monoid for TwitterUser
is now very simple because we can leverage a factory method provided by
Algebird’s called Max.monoid()
.
1 2 3 4 5 6 7 8 9 10 

That’s it!
Ok, maybe it feels a bit like cheating because the monoid is created behind the scenes by Max.monoid()
.
So what does Max.monoid()
do?
1 2 3 4 5 

Still, it’s pretty straightforward I would say. Not a lot of magic as long as you know how implicits and type classes in Scala work.
What would we do if we only wanted to add compare()
to the monoid, but not to the original type?
The Algebird code has examples for this use case. Here is the definition of the Max[List]
monoid, which as you may
notice uses Ordering
and not Ordered
as in our example above. You can ignore that small difference. The key point
is that the compare()
method is defined as part of the Max[List]
monoid instead of being “ducttaped” to List
directly.
1 2 3 4 5 6 7 8 9 10 11 12 13 

Where to go from here?
Now that we have one monoid view for TwitterUser
, what else can we do? Can we find another monoid form for it?
That’s one of the questions you should ask yourself when working with your own data structures. If you take a look at
the Algebird code, you will notice that many types such as List
will have quite a few algebraic forms.
There is one more thing I want to mention here: You may consider creating an additive monoid for TwitterUser
, i.e.
a monoid that supports a +
like operation. I couldn’t come up with any good example how the result of adding two such
values would make sense (e.g., how could you “add” their usernames in meaningful way?). That being said there is one
case where adding two TwitterUser
values would make sense: to capture the idea that one follows the other, i.e.
to create a relationship (a link) between the two. Keep in mind though that monoids and friends must adhere to the
closure principle – if you start out with a TwitterUser
value and perform monoid operations on it, the end result
must always be another TwitterUser
value. Of course such a relationship can be modeled in code, but you cannot do
this with a TwitterUser
monoid as defined above.
Creating a monad?
By now you should have sufficient understanding of monads and Algebird to implement your own monad. So I leave this as an exercise for the reader.
A starting point for you is Monad.scala in Algebird.
However if you do have a good idea what kind of monad I could showcase here – perhaps something related to Twitter to
match the TwitterUser
monoid example above? – please let me know in the comments.
Key algebraic structures in Algebird
The following table is a juxtaposition of a few key algebraic structures, notably those that are implemented in
Algebird. It should help you to navigate the Algebird code base, and also to figure out which algebraic structure
your own data types might support – i.e., “Can I turn my T
into a semigroup, or even a monoid?”.
Algebraic structure  Binary op is associative  Identity (has a zero element)  + op   op  * op  / op  References 

Semigroup  YES    YES        Wikipedia, Algebird 
Monoid  YES  YES  YES        Wikipedia, Algebird 
Group  YES  YES  YES  YES      Wikipedia, Algebird 
Ring  YES  YES  YES  YES  YES    Wikipedia, Algebird 
Field  YES  YES  YES  YES  YES  YES  Wikipedia, Algebird 
Think of +
as the general notion of “adding one thing to another”, same for the other operations. For two
List[Int]
, for instance, +
could be concatenation of the two (instead of, say, trying to add the individual Int
elements of the lists together). The operators +
, 
, *
and /
are
as defined in Algebird.
A small Algebird FAQ
Error “Cannot find Group/Monoid/… type class for a type T”?
If you run into this error it means you are trying to use an operation that is not supported by the algebraic structure
you are working with. In this specific example, a Set()
in Algebird has a monoid form and thus supports an
additionlike operation +
but not a multiplicationlike operation *
.
1 2 3 

Combine different monoids?
In theory you can combine different monoids such as Max[Int]
and Min[Int]
and form their product, but there must
exist an appropriate algebraic structure for that product. Right now, for instance, the following code will not work
in Algebird because it does not ship with a algebraic structure for (Max[Int], Min[Int])
:
1 2 3 

Are monads really everywhere?
One thing that I have not yet investigated in further detail is how using monads compares to other patterns of abstraction. For instance, you can use monads in Clojure (the author Jim Duey actually wrote a whole series of blog posts covering monads), too, but in a quick initial search I observed that Clojure developers apparently use different constructs to achieve similar effects.
If you have some insights to share here, please feel free to reply to this post!
Summary
I hope this post contributes a little bit to the understanding of the rather abstract concepts of monoids and monads, and how you can put them to good practical use via tools such as Algebird, Scalding and SummingBird.
One of my lessons learned was that working with monoids and monads is a nice opportunity to read up on more formal concepts (category theory), and at the same time realize how they can be put to practical use in engineering, notably when doing largescala data analytics.
On my side I want to thank the Twitter engineering team (@TwitterEng) not only for making those tools available to the open source community, but also for sparking my interest in the practical application of algebraic structures and category theory in general. Same shoutout for all the various people who wrote blog posts on the topic, or who shared their insights on places such as StackOverflow (see the reference section at the end of this article for a few of them). As I said there was a lot of new information to swallow – and in a short period of time – but the quest was worth it.
Many thanks! –Michael
References
Monads and monoids
I tried to categorize the references below into “easy” and “advanced” reads. Of course this is highly subjective, and your mileage may vary.
Easy reads:
 Functional Programming in Scala by P. Chiusano and R. Bjarnason, published by Manning. Includes chapters on monoids and monads, and how to implement them in Scala.
 Monads are not metaphors, by Daniel Spiewak.
 A monad is just a monoid in the category of endofunctors, what’s the problem?, question on StackOverflow. If you are just starting out with monads etc. I’d recommend to read the second answer first.
 Monads are elephants, a series of blog posts by James Iry. In Scala.
 Functors, Applicatives, And Monads In Pictures, by Aditya Bhargava.
 Monad laws (in Haskell). Remember
return
in Haskell means our constructorapply()
in Scala, and>>=
in Haskell is ourflatMap()
.  Wikipedia articles on algebraic structures: I found that selective reading of those did help my understanding (I did not try to understand all the sections in those articles). Notably, I liked the juxtaposition of semigroups, monoid, groups, rings, etc. which highlighted their similarities and differences. Later on I discovered that the Algebird code is structured similarly, so if you can tell a semigroup from a monoid you will have an easier time navigating the code.
Advanced reads:
 Monads Made Difficult, by Stephen Diel. In Haskell.
 Monads in Clojure, by Jim Duey. In Clojure. Jim actually wrote a whole series of blog posts covering monads.
 Functors, Applicative Functors and Monoids, a chapter in Learn You a Haskell.
SummingBird
 SummingBird at CUFP 2013, slides by Sam Ritchie (former Twitter engineer)
Category theory
Speaking from my own experience I would say you do not need to understand the full details of category theory. The links above should contain all the information you need to gain enough understanding of monoids, monads and such that you can be productive in a short period of time. However I have used the references below to fill gaps that remained after reading through the other sources above, and I remember I did jump back and forth between the academic references below and the more handson resources above.
 An Introduction to Category Theory, by Harold Simmons. As a novice to category theory I preferred this text over Category Theory for Computing Science (see below). Unlike the latter though the book of Simmons is not available for free.
 Category Theory for Computing Science, by Michael Barr and Charles Wells, available as a free PDF. This seems to be a seminal work on category theory and worth the read if you are interested in the mathematical foundation of the theory in about 400 pages.