Zipper Comonads, Generically

Question

Like the childcatcher in Chitty-Chitty-Bang-Bang luring kids into captivity with sweets and toys, recruiters to undergraduate Physics like to fool about with soap bubbles and boomerangs, but when the door clangs shut, it’s “Right, children, time to learn about partial differentiation!”. Me too. Don’t say I didn’t warn you.

Here’s another warning: the following code needs {-# LANGUAGE KitchenSink #-}, or rather

{-# LANGUAGE TypeFamilies, FlexibleContexts, TupleSections, GADTs, DataKinds,
    TypeOperators, FlexibleInstances, RankNTypes, ScopedTypeVariables,
    StandaloneDeriving, UndecidableInstances #-}

in no particular order.

Differentiable functors give comonadic zippers

What is a differentiable functor, anyway?

class (Functor f, Functor (DF f)) => Diff1 f where
  type DF f :: * -> *
  upF      ::  ZF f x  ->  f x
  downF    ::  f x     ->  f (ZF f x)
  aroundF  ::  ZF f x  ->  ZF f (ZF f x)

data ZF f x = (:<-:) {cxF :: DF f x, elF :: x}

It’s a functor which has a derivative, which is also a functor. The derivative represents a one-hole context for an element. The zipper type ZF f x represents the pair of a one-hole context and the element in the hole.

The operations for Diff1 describe the kinds of navigation we can do on zippers (without any notion of “leftward” and “rightward”, for which see my Clowns and Jokers paper). We can go “upward”, reassembling the structure by plugging the element in its hole. We can go “downward”, finding every way to visit an element in a give structure: we decorate every element with its context. We can go “around”,
taking an existing zipper and decorating each element with its context, so we find all the ways to refocus (and how to keep our current focus).

Now, the type of aroundF might remind some of you of

class Functor c => Comonad c where
  extract    :: c x -> x
  duplicate  :: c x -> c (c x)

and you’re right to be reminded! We have, with a hop and a skip,

instance Diff1 f => Functor (ZF f) where
  fmap f (df :<-: x) = fmap f df :<-: f x

instance Diff1 f => Comonad (ZF f) where
  extract    = elF
  duplicate  = aroundF

and we insist that

extract . duplicate == id
fmap extract . duplicate == id
duplicate . duplicate == fmap duplicate . duplicate

We also need that

fmap extract (downF xs) == xs              -- downF decorates the element in position
fmap upF (downF xs) = fmap (const xs) xs   -- downF gives the correct context

Polynomial functors are differentiable

Constant functors are differentiable.

data KF a x = KF a
instance Functor (KF a) where
  fmap f (KF a) = KF a

instance Diff1 (KF a) where
  type DF (KF a) = KF Void
  upF (KF w :<-: _) = absurd w
  downF (KF a) = KF a
  aroundF (KF w :<-: _) = absurd w

There’s nowhere to put an element, so it’s impossible to form a context. There’s nowhere to go upF or downF from, and we easily find all none of the ways to go downF.

The identity functor is differentiable.

data IF x = IF x
instance Functor IF where
  fmap f (IF x) = IF (f x)

instance Diff1 IF where
  type DF IF = KF ()
  upF (KF () :<-: x) = IF x
  downF (IF x) = IF (KF () :<-: x)
  aroundF z@(KF () :<-: x) = KF () :<-: z

There’s one element in a trivial context, downF finds it, upF repacks it, and aroundF can only stay put.

Sum preserves differentiability.

data (f :+: g) x = LF (f x) | RF (g x)
instance (Functor f, Functor g) => Functor (f :+: g) where
  fmap h (LF f) = LF (fmap h f)
  fmap h (RF g) = RF (fmap h g)

instance (Diff1 f, Diff1 g) => Diff1 (f :+: g) where
  type DF (f :+: g) = DF f :+: DF g
  upF (LF f' :<-: x) = LF (upF (f' :<-: x))
  upF (RF g' :<-: x) = RF (upF (g' :<-: x))

The other bits and pieces are a bit more of a handful. To go downF, we must go downF inside the tagged component, then fix up the resulting zippers to show the tag in the context.

  downF (LF f) = LF (fmap (\ (f' :<-: x) -> LF f' :<-: x) (downF f))
  downF (RF g) = RF (fmap (\ (g' :<-: x) -> RF g' :<-: x) (downF g))

To go aroundF, we strip the tag, figure out how to go around the untagged thing, then restore the tag in all the resulting zippers. The element in focus, x, is replaced by its entire zipper, z.

  aroundF z@(LF f' :<-: (x :: x)) =
    LF (fmap (\ (f' :<-: x) -> LF f' :<-: x) . cxF $ aroundF (f' :<-: x :: ZF f x))
    :<-: z
  aroundF z@(RF g' :<-: (x :: x)) =
    RF (fmap (\ (g' :<-: x) -> RF g' :<-: x) . cxF $ aroundF (g' :<-: x :: ZF g x))
    :<-: z

Note that I had to use ScopedTypeVariables to disambiguate the recursive calls to aroundF. As a type function, DF is not injective, so the fact that f' :: D f x is not enough to force f' :<-: x :: Z f x.

Product preserves differentiability.

data (f :*: g) x = f x :*: g x
instance (Functor f, Functor g) => Functor (f :*: g) where
  fmap h (f :*: g) = fmap h f :*: fmap h g

To focus on an element in a pair, you either focus on the left and leave the right alone, or vice versa. Leibniz’s famous product rule corresponds to a simple spatial intuition!

instance (Diff1 f, Diff1 g) => Diff1 (f :*: g) where
  type DF (f :*: g) = (DF f :*: g) :+: (f :*: DF g)
  upF (LF (f' :*: g) :<-: x) = upF (f' :<-: x) :*: g
  upF (RF (f :*: g') :<-: x) = f :*: upF (g' :<-: x)

Now, downF works similarly to the way it did for sums, except that we have to fix up the zipper context not only with a tag (to show which way we went) but also with the untouched other component.

  downF (f :*: g)
    =    fmap (\ (f' :<-: x) -> LF (f' :*: g) :<-: x) (downF f)
    :*:  fmap (\ (g' :<-: x) -> RF (f :*: g') :<-: x) (downF g)

But aroundF is a massive bag of laughs. Whichever side we are currently visiting, we have two choices:

Move aroundF on that side.
Move upF out of that side and downF into the other side.

Each case requires us to make use of the operations for the substructure, then fix up contexts.

  aroundF z@(LF (f' :*: g) :<-: (x :: x)) =
    LF (fmap (\ (f' :<-: x) -> LF (f' :*: g) :<-: x)
          (cxF $ aroundF (f' :<-: x :: ZF f x))
        :*: fmap (\ (g' :<-: x) -> RF (f :*: g') :<-: x) (downF g))
    :<-: z
    where f = upF (f' :<-: x)
  aroundF z@(RF (f :*: g') :<-: (x :: x)) =
    RF (fmap (\ (f' :<-: x) -> LF (f' :*: g) :<-: x) (downF f) :*:
        fmap (\ (g' :<-: x) -> RF (f :*: g') :<-: x)
          (cxF $ aroundF (g' :<-: x :: ZF g x)))
    :<-: z
    where g = upF (g' :<-: x)

Phew! The polynomials are all differentiable, and thus give us comonads.

Hmm. It’s all a bit abstract. So I added deriving Show everywhere I could, and threw in

deriving instance (Show (DF f x), Show x) => Show (ZF f x)

which allowed the following interaction (tidied up by hand)

> downF (IF 1 :*: IF 2)
IF (LF (KF () :*: IF 2) :<-: 1) :*: IF (RF (IF 1 :*: KF ()) :<-: 2)

> fmap aroundF it
IF  (LF (KF () :*: IF (RF (IF 1 :*: KF ()) :<-: 2)) :<-: (LF (KF () :*: IF 2) :<-: 1))
:*:
IF  (RF (IF (LF (KF () :*: IF 2) :<-: 1) :*: KF ()) :<-: (RF (IF 1 :*: KF ()) :<-: 2))

Exercise Show that the composition of differentiable functors is differentiable, using the chain rule.

Sweet! Can we go home now? Of course not. We haven’t differentiated any recursive structures yet.

Making recursive functors from bifunctors

A Bifunctor, as the existing literature on datatype generic programming (see work by Patrik Jansson and Johan Jeuring, or excellent lecture notes by Jeremy Gibbons) explains at length is a type constructor with two parameters, corresponding to two sorts of substructure. We should be able to “map” both.

class Bifunctor b where
  bimap :: (x -> x') -> (y -> y') -> b x y -> b x' y'

We can use Bifunctors to give the node structure of recursive containers. Each node has subnodes and elements. These can just be the two sorts of substructure.

data Mu b y = In (b (Mu b y) y)

See? We “tie the recursive knot” in b‘s first argument, and keep the parameter y in its second. Accordingly, we obtain once for all

instance Bifunctor b => Functor (Mu b) where
  fmap f (In b) = In (bimap (fmap f) f b)

To use this, we’ll need a kit of Bifunctor instances.

The Bifunctor Kit

Constants are bifunctorial.

newtype K a x y = K a

instance Bifunctor (K a) where
  bimap f g (K a) = K a

You can tell I wrote this bit first, because the identifiers are shorter, but that’s good because the code is longer.

Variables are bifunctorial.

We need the bifunctors corresponding to one parameter or the other, so I made a datatype to distinguish them, then defined a suitable GADT.

data Var = X | Y

data V :: Var -> * -> * -> * where
  XX :: x -> V X x y
  YY :: y -> V Y x y

That makes V X x y a copy of x and V Y x y a copy of y. Accordingly

instance Bifunctor (V v) where
  bimap f g (XX x) = XX (f x)
  bimap f g (YY y) = YY (g y)

Sums and Products of bifunctors are bifunctors

data (:++:) f g x y = L (f x y) | R (g x y) deriving Show

instance (Bifunctor b, Bifunctor c) => Bifunctor (b :++: c) where
  bimap f g (L b) = L (bimap f g b)
  bimap f g (R b) = R (bimap f g b)

data (:**:) f g x y = f x y :**: g x y deriving Show

instance (Bifunctor b, Bifunctor c) => Bifunctor (b :**: c) where
  bimap f g (b :**: c) = bimap f g b :**: bimap f g c

So far, so boilerplate, but now we can define things like

List = Mu (K () :++: (V Y :**: V X))

Bin = Mu (V Y :**: (K () :++: (V X :**: V X)))

If you want to use these types for actual data and not go blind in the pointilliste tradition of Georges Seurat, use pattern synonyms.

But what of zippers? How shall we show that Mu b is differentiable? We shall need to show that b is differentiable in both variables. Clang! It’s time to learn about partial differentiation.

Partial derivatives of bifunctors

Because we have two variables, we shall need to be able to talk about them collectively sometimes and individually at other times. We shall need the singleton family:

data Vary :: Var -> * where
  VX :: Vary X
  VY :: Vary Y

Now we can say what it means for a Bifunctor to have partial derivatives at each variable, and give the corresponding notion of zipper.

class (Bifunctor b, Bifunctor (D b X), Bifunctor (D b Y)) => Diff2 b where
  type D b (v :: Var) :: * -> * -> *
  up      :: Vary v -> Z b v x y -> b x y
  down    :: b x y -> b (Z b X x y) (Z b Y x y)
  around  :: Vary v -> Z b v x y -> Z b v (Z b X x y) (Z b Y x y)

data Z b v x y = (:<-) {cxZ :: D b v x y, elZ :: V v x y}

This D operation needs to know which variable to target. The corresponding zipper Z b v tells us which variable v must be in focus. When we “decorate with context”, we have to decorate x-elements with X-contexts and y-elements with Y-contexts. But otherwise, it’s the same story.

We have two remaining tasks: firstly, to show that our bifunctor kit is differentiable; secondly, to show that Diff2 b allows us to establish Diff1 (Mu b).

Differentiating the Bifunctor kit

I’m afraid this bit is fiddly rather than edifying. Feel free to skip along.

The constants are as before.

instance Diff2 (K a) where
  type D (K a) v = K Void
  up _ (K q :<- _) = absurd q
  down (K a) = K a
  around _ (K q :<- _) = absurd q

On this occasion, life is too short to develop the theory of the type level Kronecker-delta, so I just treated the variables separately.

instance Diff2 (V X) where
  type D (V X) X = K ()
  type D (V X) Y = K Void
  up VX (K () :<- XX x)  = XX x
  up VY (K q :<- _)      = absurd q
  down (XX x) = XX (K () :<- XX x)
  around VX z@(K () :<- XX x)  = K () :<- XX z
  around VY (K q :<- _)        = absurd q

instance Diff2 (V Y) where
  type D (V Y) X = K Void
  type D (V Y) Y = K ()
  up VX (K q :<- _)      = absurd q
  up VY (K () :<- YY y)  = YY y
  down (YY y) = YY (K () :<- YY y)
  around VX (K q :<- _)        = absurd q
  around VY z@(K () :<- YY y)  = K () :<- YY z

For the structural cases, I found it useful to introduce a helper allowing me to treat variables uniformly.

vV :: Vary v -> Z b v x y -> V v (Z b X x y) (Z b Y x y)
vV VX z = XX z
vV VY z = YY z

I then built gadgets to facilitate the kind of “retagging” we need for down and around. (Of course, I saw which gadgets I needed as I was working.)

zimap :: (Bifunctor c) => (forall v. Vary v -> D b v x y -> D b' v x y) ->
         c (Z b X x y) (Z b Y x y) -> c (Z b' X x y) (Z b' Y x y)
zimap f = bimap
  (\ (d :<- XX x) -> f VX d :<- XX x)
  (\ (d :<- YY y) -> f VY d :<- YY y)

dzimap :: (Bifunctor (D c X), Bifunctor (D c Y)) =>
         (forall v. Vary v -> D b v x y -> D b' v x y) ->
         Vary v -> Z c v (Z b X x y) (Z b Y x y) -> D c v (Z b' X x y) (Z b' Y x y)
dzimap f VX (d :<- _) = bimap
  (\ (d :<- XX x) -> f VX d :<- XX x)
  (\ (d :<- YY y) -> f VY d :<- YY y)
  d
dzimap f VY (d :<- _) = bimap
  (\ (d :<- XX x) -> f VX d :<- XX x)
  (\ (d :<- YY y) -> f VY d :<- YY y)
  d

And with that lot ready to go, we can grind out the details. Sums are easy.

instance (Diff2 b, Diff2 c) => Diff2 (b :++: c) where
  type D (b :++: c) v = D b v :++: D c v
  up v (L b' :<- vv) = L (up v (b' :<- vv))
  down (L b) = L (zimap (const L) (down b))
  down (R c) = R (zimap (const R) (down c))
  around v z@(L b' :<- vv :: Z (b :++: c) v x y)
    = L (dzimap (const L) v ba) :<- vV v z
    where ba = around v (b' :<- vv :: Z b v x y)
  around v z@(R c' :<- vv :: Z (b :++: c) v x y)
    = R (dzimap (const R) v ca) :<- vV v z
    where ca = around v (c' :<- vv :: Z c v x y)

Products are hard work, which is why I’m a mathematician rather than an engineer.

instance (Diff2 b, Diff2 c) => Diff2 (b :**: c) where
  type D (b :**: c) v = (D b v :**: c) :++: (b :**: D c v)
  up v (L (b' :**: c) :<- vv) = up v (b' :<- vv) :**: c
  up v (R (b :**: c') :<- vv) = b :**: up v (c' :<- vv)
  down (b :**: c) =
    zimap (const (L . (:**: c))) (down b) :**: zimap (const (R . (b :**:))) (down c)
  around v z@(L (b' :**: c) :<- vv :: Z (b :**: c) v x y)
    = L (dzimap (const (L . (:**: c))) v ba :**:
        zimap (const (R . (b :**:))) (down c))
      :<- vV v z where
      b = up v (b' :<- vv :: Z b v x y)
      ba = around v (b' :<- vv :: Z b v x y)
  around v z@(R (b :**: c') :<- vv :: Z (b :**: c) v x y)
    = R (zimap (const (L . (:**: c))) (down b):**:
        dzimap (const (R . (b :**:))) v ca)
      :<- vV v z where
      c = up v (c' :<- vv :: Z c v x y)
      ca = around v (c' :<- vv :: Z c v x y)

Conceptually, it’s just as before, but with more bureaucracy. I built these using pre-type-hole technology, using undefined as a stub in places I wasn’t ready to work, and introducing a deliberate type error in the one place (at any given time) where I wanted a useful hint from the typechecker. You too can have the typechecking as videogame experience, even in Haskell.

Subnode zippers for recursive containers

The partial derivative of b with respect to X tells us how to find a subnode one step inside a node, so we get the conventional notion of zipper.

data MuZpr b y = MuZpr
  {  aboveMu  :: [D b X (Mu b y) y]
  ,  hereMu   :: Mu b y
  }

We can zoom all the way up to the root by repeated plugging in X positions.

muUp :: Diff2 b => MuZpr b y -> Mu b y
muUp (MuZpr {aboveMu = [], hereMu = t}) = t
muUp (MuZpr {aboveMu = (dX : dXs), hereMu = t}) =
  muUp (MuZpr {aboveMu = dXs, hereMu = In (up VX (dX :<- XX t))})

But we need element-zippers.

Element-zippers for fixpoints of bifunctors

Each element is somewhere inside a node. That node is sitting under a stack of X-derivatives. But the position of the element in that node is given by a Y-derivative. We get

data MuCx b y = MuCx
  {  aboveY  :: [D b X (Mu b y) y]
  ,  belowY  :: D b Y (Mu b y) y
  }

instance Diff2 b => Functor (MuCx b) where
  fmap f (MuCx { aboveY = dXs, belowY = dY }) = MuCx
    {  aboveY  = map (bimap (fmap f) f) dXs
    ,  belowY  = bimap (fmap f) f dY
    }

Boldly, I claim

instance Diff2 b => Diff1 (Mu b) where
  type DF (Mu b) = MuCx b

but before I develop the operations, I’ll need some bits and pieces.

I can trade data between functor-zippers and bifunctor-zippers as follows:

zAboveY :: ZF (Mu b) y -> [D b X (Mu b y) y]  -- the stack of `X`-derivatives above me
zAboveY (d :<-: y) = aboveY d

zZipY :: ZF (Mu b) y -> Z b Y (Mu b y) y      -- the `Y`-zipper where I am
zZipY (d :<-: y) = belowY d :<- YY y

That’s enough to let me define:

  upF z  = muUp (MuZpr {aboveMu = zAboveY z, hereMu = In (up VY (zZipY z))})

That is, we go up by first reassembling the node where the element is, turning an element-zipper into a subnode-zipper, then zooming all the way out, as above.

Next, I say

  downF  = yOnDown []

to go down starting with the empty stack, and define the helper function which goes down repeatedly from below any stack:

yOnDown :: Diff2 b => [D b X (Mu b y) y] -> Mu b y -> Mu b (ZF (Mu b) y)
yOnDown dXs (In b) = In (contextualize dXs (down b))

Now, down b only takes us inside the node. The zippers we need must also carry the node’s context. That’s what contextualise does:

contextualize :: (Bifunctor c, Diff2 b) =>
  [D b X (Mu b y) y] ->
  c (Z b X (Mu b y) y) (Z b Y (Mu b y) y) ->
  c (Mu b (ZF (Mu b) y)) (ZF (Mu b) y)
contextualize dXs = bimap
  (\ (dX :<- XX t) -> yOnDown (dX : dXs) t)
  (\ (dY :<- YY y) -> MuCx {aboveY = dXs, belowY = dY} :<-: y)

For every Y-position, we must give an element-zipper, so it is good we know the whole context dXs back to the root, as well as the dY which describes how the element sits in its node. For every X-position, there is a further subtree to explore, so we grow the stack and keep going!

That leaves only the business of shifting focus. We might stay put, or go down from where we are, or go up, or go up and then down some other path. Here goes.

  aroundF z@(MuCx {aboveY = dXs, belowY = dY} :<-: _) = MuCx
    {  aboveY = yOnUp dXs (In (up VY (zZipY z)))
    ,  belowY = contextualize dXs (cxZ $ around VY (zZipY z))
    }  :<-: z

As ever, the existing element is replaced by its entire zipper. For the belowY part, we look where else we can go in the existing node: we will find either alternative element Y-positions or further X-subnodes to explore, so we contextualise them. For the aboveY part, we must work our way back up the stack of X-derivatives after reassembling the node we were visiting.

yOnUp :: Diff2 b => [D b X (Mu b y) y] -> Mu b y ->
         [D b X (Mu b (ZF (Mu b) y)) (ZF (Mu b) y)]
yOnUp [] t = []
yOnUp (dX : dXs) (t :: Mu b y)
  =  contextualize dXs (cxZ $ around VX (dX :<- XX t))
  :  yOnUp dXs (In (up VX (dX :<- XX t)))

At each step of the way, we can either turn somewhere else that’s around, or keep going up.

And that’s it! I haven’t given a formal proof of the laws, but it looks to me as if the operations carefully maintain the context correctly as they crawl the structure.

What have we learned?

Differentiability induces notions of thing-in-its-context, inducing a comonadic structure where extract gives you the thing and duplicate explores the context looking for other things to contextualise. If we have the appropriate differential structure for nodes, we can develop differential structure for whole trees.

Oh, and treating each individual arity of type constructor separately is blatantly horrendous. The better way is to work with functors between indexed sets

f :: (i -> *) -> (o -> *)

where we make o different sorts of structure storing i different sorts of element. These are closed under the Jacobian construction

J f :: (i -> *) -> ((o, i) -> *)

where each of the resulting (o, i)-structures is a partial derivative, telling you how to make an i-element-hole in an o-structure. But that’s dependently typed fun, for another time.