Confused by the meaning of the ‘Alternative’ type class and its relationship to other type classes

To begin with, let me offer short answers to each of these questions. I will then expand each into a longer detailed answer, but these short ones will hopefully help in navigating those.

  1. No, Alternative and Monoid don’t mean different things; Alternative is for types which have the structure both of Applicative and of Monoid. “Picking” and “combining” are two different intuitions for the same broader concept.

  2. Alternative contains empty as well as <|> because the designers thought this would be useful, and because this gives rise to a monoid. In terms of picking, empty corresponds to making an impossible choice.

  3. We need both Alternative and Monoid because the former obeys (or should) more laws than the latter; these laws relate the monoidal and applicative structure of the type constructor. Additionally, Alternative can’t depend on the inner type, while Monoid can.

  4. MonadPlus is slightly stronger than Alternative, as it must obey more laws; these laws relate the monoidal structure to the monadic structure in addition to the applicative structure. If you have instances of both, they should coincide.


Doesn’t Alternative mean something totally different from Monoid?

Not really! Part of the reason for your confusion is that the Haskell Monoid class uses some pretty bad (well, insufficiently general) names. This is how a mathematician would define a monoid (being very explicit about it):

Definition. A monoid is a set M equipped with a distinguished element εM and a binary operator · : M × MM, denoted by juxtaposition, such that the following two conditions hold:

  1. ε is the identity: for all mM, mε = εm = m.
  2. · is associative: for all m₁,m₂,m₃ ∈ M, (mm₂)m₃ = m₁(mm₃).

That’s it. In Haskell, ε is spelled mempty and · is spelled mappend (or, these days, <>), and the set M is the type M in instance Monoid M where ....

Looking at this definition, we see that it says nothing about “combining” (or about “picking,” for that matter). It says things about · and about ε, but that’s it. Now, it’s certainly true that combining things works well with this structure: ε corresponds to having no things, and mm₂ says that if I glom m₁ and m₂’s stuff together, I can get a new thing containing all their stuff. But here’s an alternative intuition: ε corresponds to no choices at all, and mm₂ corresponds to a choice between m₁ and m₂. This is the “picking” intuition. Note that both obey the monoid laws:

  1. Having nothing at all and having no choice are both the identity.
    • If I have no stuff and glom it together with some stuff, I end up with that same stuff again.
    • If I have a choice between no choice at all (something impossible) and some other choice, I have to pick the other (possible) choice.
  2. Glomming collections together and making a choice are both associative.
    • If I have three collections of things, it doesn’t matter if I glom the first two together and then the third, or the last two together and then the first; either way, I end up with the same total glommed collection.
    • If I have a choice between three things, it doesn’t matter if I (a) first choose between first-or-second and third and then, if I need to, between first and second, or (b) first choose between first and second-or-third and then, if I need to, between second and third. Either way, I can pick what I want.

(Note: I’m playing fast and loose here; that’s why it’s intuition. For instance, it’s important to remember that · need not be commutative, which the above glosses over: it’s perfectly possible that mm₂ ≠ mm₁.)

Behold: both these sorts of things (and many others—is multiplying numbers really either “combining” or “picking”?) obey the same rules. Having an intuition is important to develop understanding, but it’s the rules and definitions that determine what’s actually going on.

And the best part is that these both of these intuitions can be interpreted by the same carrier! Let M be some set of sets (not a set of all sets!) containing the empty set, let ε be the empty set ∅, and let · be set union ∪. It is easy to see that ∅ is an identity for ∪, and that ∪ is associative, so we can conclude that (M,∅,∪) is a monoid. Now:

  1. If we think about sets as being collections of things, then ∪ corresponds to glomming them together to get more things—the “combining” intuition.
  2. If we think about sets as representing possible actions, then ∪ corresponds to increasing your pool of possible actions to pick from—the “picking” intuition.

And this is exactly what’s going on with [] in Haskell: [a] is a Monoid for all a, and [] as an applicative functor (and monad) is used to represent nondeterminism. Both the combining and the picking intuitions coincide at the same type: mempty = empty = [] and mappend = (<|>) = (++).

So the Alternative class is just there to represent objects which (a) are applicative functors, and (b) when instantiated at a type, have a value and a binary function on them which follow some rules. Which rules? The monoid rules. Why? Because it turns out to be useful 🙂


Why does Alternative need an empty method/member?

Well, the snarky answer is “because Alternative represents a monoid structure.” But the real question is: why a monoid structure? Why not just a semigroup, a monoid without ε? One answer is to claim that monoids are just more useful. I think many people (but perhaps not Edward Kmett) would agree with this; almost all of the time, if you have a sensible (<|>)/mappend/·, you’ll be able to define a sensible empty/mempty/ε. On the other hand, having the extra generality is nice, since it lets you place more things under the umbrella.

You also want to know how this meshes with the “picking” intuition. Keeping in mind that, in some sense, the right answer is “know when to abandon the ‘picking’ intuition,” I think you can unify the two. Consider [], the applicative functor for nondeterminism. If I combine two values of type [a] with (<|>), that corresponds to nondeterministically picking either an action from the left or an action from the right. But sometimes, you’re going to have no possible actions on one side—and that’s fine. Similarly, if we consider parsers, (<|>) represents a parser which parses either what’s on the left or what’s on the right (it “picks”). And if you have a parser which always fails, that ends up being an identity: if you pick it, you immediately reject that pick and try the other one.

All this said, remember that it would be entirely possible to have a class almost like Alternative, but lacking empty. That would be perfectly valid—it could even be a superclass of Alternative—but happens not to be what Haskell did. Presumably this is out of a guess as to what’s useful.


Why does the Alternative type class need an Applicative constraint, and why does it need a kind of * -> *? … Why not just [use] liftA2 mappend?

Well, let’s consider each of these three proposed changes: getting rid of the Applicative constraint for Alternative; changing the kind of Alternative’s argument; and using liftA2 mappend instead of <|> and pure mempty instead of empty. We’ll look at this third change first, since it’s the most different. Suppose we got rid of Alternative entirely, and replaced the class with two plain functions:

fempty :: (Applicative f, Monoid a) => f a
fempty = pure mempty

(>|<) :: (Applicative f, Monoid a) => f a -> f a -> f a
(>|<) = liftA2 mappend

We could even keep the definitions of some and many. And this does give us a monoid structure, it’s true. But it seems like it gives us the wrong one . Should Just fst >|< Just snd fail, since (a,a) -> a isn’t an instance of Monoid? No, but that’s what the above code would result in. The monoid instance we want is one that’s inner-type agnostic (to borrow terminology from Matthew Farkas-Dyck in a very related haskell-cafe discussion which asks some very similar questions); the Alternative structure is about a monoid determined by f’s structure, not the structure of f’s argument.

Now that we think we want to leave Alternative as some sort of type class, let’s look at the two proposed ways to change it. If we change the kind, we have to get rid of the Applicative constraint; Applicative only talks about things of kind * -> *, and so there’s no way to refer to it. That leaves two possible changes; the first, more minor, change is to get rid of the Applicative constraint but leave the kind alone:

class Alternative' f where
  empty' :: f a
  (<||>) :: f a -> f a -> f a

The other, larger, change is to get rid of the Applicative constraint and change the kind:

class Alternative'' a where
  empty'' :: a
  (<|||>) :: a -> a -> a

In both cases, we have to get rid of some/many, but that’s OK; we can define them as standalone functions with the type (Applicative f, Alternative' f) => f a -> f [a] or (Applicative f, Alternative'' (f [a])) => f a -> f [a].

Now, in the second case, where we change the kind of the type variable, we see that our class is exactly the same as Monoid (or, if you still want to remove empty'', Semigroup), so there’s no advantage to having a separate class. And in fact, even if we leave the kind variable alone but remove the Applicative constraint, Alternative just becomes forall a. Monoid (f a), although we can’t write these quantified constraints in Haskell, not even with all the fancy GHC extensions. (Note that this expresses the inner-type–agnosticism mentioned above.) Thus, if we can make either of these changes, then we have no reason to keep Alternative (except for being able to express that quantified constraint, but that hardly seems compelling).

So the question boils down to “is there a relationship between the Alternative parts and the Applicative parts of an f which is an instance of both?” And while there’s nothing in the documentation, I’m going to take a stand and say yes—or at the very least, there ought to be. I think that Alternative is supposed to obey some laws relating to Applicative (in addition to the monoid laws); in particular, I think those laws are something like

  1. Right distributivity (of <*>):  (f <|> g) <*> a = (f <*> a) <|> (g <*> a)
  2. Right absorption (for <*>):  empty <*> a = empty
  3. Left distributivity (of fmap):  f <$> (a <|> b) = (f <$> a) <|> (f <$> b)
  4. Left absorption (for fmap):  f <$> empty = empty

These laws appear to be true for [] and Maybe, and (pretending its MonadPlus instance is an Alternative instance) IO, but I haven’t done any proofs or exhaustive testing. (For instance, I originally thought that left distributivity held for <*>, but this “performs the effects” in the wrong order for [].) By way of analogy, though, it is true that MonadPlus is expected to obey similar laws (although there is apparently some ambiguity about which). I had originally wanted to claim a third law, which seems natural:

  • Left absorption (for <*>):  a <*> empty = empty

However, although I believe [] and Maybe obey this law, IO doesn’t, and I think (for reasons that will become apparent in the next couple of paragraphs) it’s best not to require it.

And indeed, it appears that Edward Kmett has some slides where he espouses a similar view; to get into that, we’ll need to take brief digression involving some more mathematical jargon. The final slide, “I Want More Structure,” says that “A Monoid is to an Applicative as a Right Seminearring is to an Alternative,” and “If you throw away the argument of an Applicative, you get a Monoid, if you throw away the argument of an Alternative you get a RightSemiNearRing.”

Right seminearrings? “How did right seminearrings get into it?” I hear you cry. Well,

Definition. A right near-semiring (also right seminearring, but the former seems to be used more on Google) is a quadruple (R,+,·,0) where (R,+,0) is a monoid, (R,·) is a semigroup, and the following two conditions hold:

  1. · is right-distributive over +: for all r,s,tR, (s + t)r = sr + tr.
  2. 0 is right-absorbing for ·: for all rR, 0r = 0.

A left near-semiring is defined analogously.

Now, this doesn’t quite work, because <*> is not truly associative or a binary operator—the types don’t match. I think this is what Edward Kmett is getting at when he talks about “throw[ing] away the argument.” Another option might be to say (I’m unsure if this is right) that we actually want (f a, <|>, <*>, empty) to form a right near-semiringoid, where the “-oid” suffix indicates that the binary operators can only be applied to specific pairs of elements (à la groupoids). And we’d also want to say that (f a, <|>, <$>, empty) was a left near-semiringoid, although this could conceivably follow from the combination of the Applicative laws and the right near-semiringoid structure. But now I’m getting in over my head, and this isn’t deeply relevant anyway.

At any rate, these laws, being stronger than the monoid laws, mean that perfectly valid Monoid instances would become invalid Alternative instances. There are (at least) two examples of this in the standard library: Monoid a => (a,) and Maybe. Let’s look at each of them quickly.

Given any two monoids, their product is a monoid; consequently, tuples can be made an instance of Monoid in the obvious way (reformatting the base package’s source):

instance (Monoid a, Monoid b) => Monoid (a,b) where
  mempty = (mempty, mempty)
  (a1,b1) `mappend` (a2,b2) = (a1 `mappend` a2, b1 `mappend` b2)

Similarly, we can make tuples whose first component is an element of a monoid into an instance of Applicative by accumulating the monoid elements (reformatting the base package’s source):

instance Monoid a => Applicative ((,) a) where
  pure x = (mempty, x)
  (u, f) <*> (v, x) = (u `mappend` v, f x)

However, tuples aren’t an instance of Alternative, because they can’t be—the monoidal structure over Monoid a => (a,b) isn’t present for all types b, and Alternative’s monoidal structure must be inner-type agnostic. Not only must b be a monad, to be able to express (f <> g) <*> a, we need to use the Monoid instance for functions, which is for functions of the form Monoid b => a -> b. And even in the case where we have all the necessary monoidal structure, it violates all four of the Alternative laws. To see this, let ssf n = (Sum n, (<> Sum n)) and let ssn = (Sum n, Sum n). Then, writing (<>) for mappend, we get the following results (which can be checked in GHCi, with the occasional type annotation):

  1. Right distributivity:
    • (ssf 1 <> ssf 1) <*> ssn 1 = (Sum 3, Sum 4)
    • (ssf 1 <*> ssn 1) <> (ssf 1 <*> ssn 1) = (Sum 4, Sum 4)
  2. Right absorption:
    • mempty <*> ssn 1 = (Sum 1, Sum 0)
    • mempty = (Sum 0, Sum 0)
  3. Left distributivity:
    • (<> Sum 1) <$> (ssn 1 <> ssn 1) = (Sum 2, Sum 3)
    • ((<> Sum 1) <$> ssn 1) <> ((<> Sum 1) <$> ssn 1) = (Sum 2, Sum 4)
  4. Left absorption:
    • (<> Sum 1) <$> mempty = (Sum 0, Sum 1)
    • mempty = (Sum 1, Sum 1)

Next, consider Maybe. As it stands, Maybe’s Monoid and Alternative instances disagree. (Although the haskell-cafe discussion I mention at the beginning of this section proposes changing this, there’s an Option newtype from the semigroups package which would produce the same effect.) As a Monoid, Maybe lifts semigroups into monoids by using Nothing as the identity; since the base package doesn’t have a semigroup class, it just lifts monoids, and so we get (reformatting the base package’s source):

instance Monoid a => Monoid (Maybe a) where
  mempty = Nothing
  Nothing `mappend` m       = m
  m       `mappend` Nothing = m
  Just m1 `mappend` Just m2 = Just (m1 `mappend` m2)

On the other hand, as an Alternative, Maybe represents prioritized choice with failure, and so we get (again reformatting the base package’s source):

instance Alternative Maybe where
  empty = Nothing
  Nothing <|> r = r
  l       <|> _ = l

And it turns out that only the latter satisfies the Alternative laws. The Monoid instance fails less badly than (,)’s; it does obey the laws with respect to <*>, although almost by accident—it comes form the behavior of the only instance of Monoid for functions, which (as mentioned above), lifts functions that return monoids into the reader applicative functor. If you work it out (it’s all very mechanical), you’ll find that right distributivity and right absorption for <*> all hold for both the Alternative and Monoid instances, as does left absorption for fmap. And left distributivity for fmap does hold for the Alternative instance, as follows:

f <$> (Nothing <|> b)
  = f <$> b                          by the definition of (<|>)
  = Nothing <|> (f <$> b)            by the definition of (<|>)
  = (f <$> Nothing) <|> (f <$> b)    by the definition of (<$>)

f <$> (Just a <|> b)
  = f <$> Just a                     by the definition of (<|>)
  = Just (f a)                       by the definition of (<$>)
  = Just (f a) <|> (f <$> b)         by the definition of (<|>)
  = (f <$> Just a) <|> (f <$> b)     by the definition of (<$>)

However, it fails for the Monoid instance; writing (<>) for mappend, we have:

  • (<> Sum 1) <$> (Just (Sum 0) <> Just (Sum 0)) = Just (Sum 1)
  • ((<> Sum 1) <$> Just (Sum 0)) <> ((<> Sum 1) <$> Just (Sum 0)) = Just (Sum 2)

Now, there is one caveat to this example. If you only require that Alternatives be compatibility with <*>, and not with <$>, then Maybe is fine. Edward Kmett’s slides, mentioned above, don’t make reference to <$>, but I think it seems reasonable to require laws with respect to it as well; nevertheless, I can’t find anything to back me up on this.

Thus, we can conclude that being an Alternative is a stronger requirement than being a Monoid, and so it requires a different class. The purest example of this would be a type with an inner-type agnostic Monoid instance and an Applicative instance which were incompatible with each other; however, there aren’t any such types in the base package, and I can’t think of any. (It’s possible none exist, although I’d be surprised.) Nevertheless, these inner-type gnostic examples demonstrate why the two type classes must be different.


What’s the point of the MonadPlus type class?

MonadPlus, like Alternative, is a strengthening of Monoid, but with respect to Monad instead of Applicative. According to Edward Kmett in his answer to the question “Distinction between typeclasses MonadPlus, Alternative, and Monoid?”, MonadPlus is also stronger than Alternative: the law empty <*> a, for instance, doesn’t imply that empty >>= f. AndrewC provides two examples of this: Maybe and its dual. The issue is complicated by the fact that there are two potential sets of laws for MonadPlus. It is universally agreed that MonadPlus is supposed to form a monoid with mplus and mempty, and it’s supposed to satisfy the left zero law, mempty >>= f = mempty. Hhowever, some MonadPlusses satisfy left distribution, mplus a b >>= f = mplus (a >>= f) (b >>= f); and others satisfy left catch, mplus (return a) b = return a. (Note that left zero/distribution for MonadPlus are analogous to right distributivity/absorption for Alternative; (<*>) is more analogous to (=<<) than (>>=).) Left distribution is probably “better,” so any MonadPlus instance which satisfies left catch, such as Maybe, is an Alternative but not the first kind of MonadPlus. And since left catch relies on ordering, you can imagine a newtype wrapper for Maybe whose Alternative instance is right-biased instead of left-biased: a <|> Just b = Just b. This will satisfy neither left distribution nor left catch, but will be a perfectly valid Alternative.

However, since any type which is a MonadPlus ought to have its instance coincide with its Alternative instance (I believe this is required in the same way that it is required that ap and (<*>) are equal for Monads that are Applicatives), you could imagine defining the MonadPlus class instead as

class (Monad m, Alternative m) => MonadPlus' m

The class doesn’t need to declare new functions; it’s just a promise about the laws obeyed by empty and (<|>) for the given type. This design technique isn’t used in the Haskell standard libraries, but is used in some more mathematically-minded packages for similar purposes; for instance, the lattices package uses it to express the idea that a lattice is just a join semilattice and a meet semilattice over the same type which are linked by absorption laws.

The reason you can’t do the same for Alternative, even if you wanted to guarantee that Alternative and Monoid always coincided, is because of the kind mismatch. The desired class declaration would have the form

class (Applicative f, forall a. Monoid (f a)) => Alternative''' f

but (as mentioned far above) not even GHC Haskell supports quantified constraints.

Also, note that having Alternative as be a superclass of MonadPlus would require Applicative being a superclass of Monad, so good luck getting that to happen. If you run into that problem, there’s always the WrappedMonad newtype, which turns any Monad into an Applicative in the obvious way; there’s an instance MonadPlus m => Alternative (WrappedMonad m) where ... which does exactly what you’d expect.

Leave a Comment