A basic notion of data flow pipeline is described in a previous post, Data Flow Pipeline in Clojure. Here we will add some formalism to it using Category Theory applied in the context of Clojure. Monads require types and is best suited with a strongly typed language.
1. First we need to define a category, let's say turtle
.
2. A category consists of a collection of entities called objects
and a collection of entities called arrows
and few other properties (assignments and composition).
3. There are two assignments source
(or domain
), and target
(or codomain
) such that each of which attaches an object
to an arrow
. That is an object in source consumes an arrow and returns an object in the target. This can be represented as
4. So to model this, let's use clojure.spec
.
Here we defined the object to be a ::turtle-variant
, the structure of which is [keyword? string? map?]
. We need to check that the arrows takes the objects that belongs to the turtle
category. Else the morphism cannot belong to the category.
5. Some helper functions.
6. Now let us create an arrow that the objects in the category can consume and which can return an object. The return object must be in the category. Since the arrow can produce and return object, we can intuitively refer it to as functions or morphism. Here the object in the category is ::turtle-variant
and since the morphism produce the same object, we call it endomorphism. It is a mapping of object in a category to itself. We define a functor such that it takes some transformation, but preserves the structure. Below we can see that the fetch
takes in a ::turtle-variant
and returns a ::turtle-variant
.
7. The turtle
category is not fully defined yet, because we need to define partial composition of arrows. Identity can be defined trivially.
Now, here is where we differ from the most monad libraries out there that tries to implement what Haskell does. Here partial composition does not mean, a function should return a partial. Certain pair of arrows are compatible for composition to form another arrow. Two arrows
are composable in that order precisely, when B1
and B2
are the same object, and then an arrow
is formed.
8. If we look at the way we defined the functions and the type constraint on the function, this law holds. Because here all functions takes the same structure and return the same structure, the ::turtle-variant
. At this point, we have defined our category turtle
.
9. But from a programmer's perspective, simple having standalone functions are not useful. We need to compose them. Meaning, call a function passing arg if necessary, take the return, call another function with arg of the return of the previous function and such.
The endomorph
is a monoid. A monoid structure is defined as
where R
is a set, *
is a binary operation on R
and 1
is a nominated element of R
, where
In our case, the set R
is the set of functions that operates on functors. And we have defined only one, which is the endomorph
function. This function operates on functors (which are in itself endofunctors here). The binary operation is the reduce
function and identity is trivial. We can use constantly
and identity
functions defined in Clojure to do that. The associative law hold because, here the order of composition does not matter because, in whichever way we order the composition, the result is the same object ::turtle-variant
. That is from a theory perspective. But from a programmer's perspective, we cannot do arbitrary ordering as we need values within the structure. So conceptually it is associative, but the order of computation matters.
10. Now this is also a monad. The classic definition of a monad follows.
All told a monad in X is just a monoid in the category of endofunctors of X, with product * replaced by composition of endofunctors and unit set by the identity endofunctor
- Saunders Mac Lane in Categories for the Working Mathematician
From the above definition, it is clear that the endomorph
is a monad as it operates on endofunctors and itself is a monoid (from above). Now it is also a forgetful functor because some property is discarded once we apply the transformation.
Example usage:
This proof is based on my reading of An Introduction to Category Theory by Harold Simmons. After all this proof, I came to the realisation about Leibniz's reference about everything is a monad.