A basic notion of data flow pipeline is described in a previous post, Data Flow Pipeline in Clojure. Here we will add some formalism to it using Category Theory applied in the context of Clojure. Monads require types and is best suited with a strongly typed language.

1. First we need to define a category, let's say turtle.

2. A category consists of a collection of entities called objects and a collection of entities called arrows and few other properties (assignments and composition).

3. There are two assignments source (or domain), and target (or codomain) such that each of which attaches an object to an arrow. That is an object in source consumes an arrow and returns an object in the target. This can be represented as

                    f
  A -----------------------------------> B
source            arrow               target
domain           morphism            codomain
                   map

4. So to model this, let's use clojure.spec.

(spec/def ::variant vector?)
(spec/def ::tag keyword?)
(spec/def ::msg string?)
(spec/def ::state (spec/map-of keyword? any?))
(spec/def ::turtle-variant (spec/tuple ::tag ::msg ::state))

Here we defined the object to be a ::turtle-variant, the structure of which is [keyword? string? map?]. We need to check that the arrows takes the objects that belongs to the turtle category. Else the morphism cannot belong to the category.

(defn validate-variant
  "Check if the given element belongs to the category turtle.
   Given a data structure, check if it conforms to a variant. If it does not
   explain the reason for non-conformance."
  [predicate]
  (let [rule (fn [f] (f ::turtle-variant predicate))]
    (if-not (rule spec/valid?)
      (rule spec/explain)
      true)))

5. Some helper functions.

(defn ok
  "Construct an object of type success in turtle which can be passed to a 
   functor. Returns a ::turtle-variant."
  [msg result]
  [:ok msg result])

(defn err
  "Construct an object of type error in turtle which can be passed to a 
   functor. Returns a ::turtle-variant."
  [msg ex]
  [:err msg ex])

6. Now let us create an arrow that the objects in the category can consume and which can return an object. The return object must be in the category. Since the arrow can produce and return object, we can intuitively refer it to as functions or morphism. Here the object in the category is ::turtle-variant and since the morphism produce the same object, we call it endomorphism. It is a mapping of object in a category to itself. We define a functor such that it takes some transformation, but preserves the structure. Below we can see that the fetch takes in a ::turtle-variant and returns a ::turtle-variant.

(defn fetch
  "A functor which performs some operation on the object."
  [varg]
  (let [[tag msg {:keys [url opts] :as state}] varg
        [status body status-code] (http-get url opts)]
    (condp = status
      :ok (ok msg (merge {:res body} state))
      :err (err "Error" {:msg msg :excep body}))))

(fetch (ok "html" {:url "http://example.com" :opts nil})

7. The turtle category is not fully defined yet, because we need to define partial composition of arrows. Identity can be defined trivially.

Arw x Arw -> Arw

Now, here is where we differ from the most monad libraries out there that tries to implement what Haskell does. Here partial composition does not mean, a function should return a partial. Certain pair of arrows are compatible for composition to form another arrow. Two arrows

      f                  g
A ---------> B1   B2 ---------> C

are composable in that order precisely, when B1 and B2 are the same object, and then an arrow

A ---------> C

is formed.

8. If we look at the way we defined the functions and the type constraint on the function, this law holds. Because here all functions takes the same structure and return the same structure, the ::turtle-variant. At this point, we have defined our category turtle.

9. But from a programmer's perspective, simple having standalone functions are not useful. We need to compose them. Meaning, call a function passing arg if necessary, take the return, call another function with arg of the return of the previous function and such.

(defn endomorph
  "Apply a morphism (transformation) for the given input (domain) returning an
   output (co-domain) where both domain and co-domain belongs to the category
   turtle. The functors are arrows in the same category. This qualifies as a
   monad because it operates on functors and returns an object. And associative
   law holds because in whichever way we compose the functors, the result object
   is ::turtle-variant. This is also a forgetful functor.
   
   Decide whether to continue with the computation or return. All function in 
   the chain should accept and return a variant, which is a 3-tuple."
  ([fns]
   (endomorph fns (ok "" {})))
  ([fns init-tup]
   (reduce (fn [variant fun]
             {:pre [(validate-variant variant)]
              :post [(if (reduced? %)
                       (validate-variant (deref %))
                       (validate-variant %))]}
             (let [[tag _ _] variant]
               (condp = tag
                 ;; if tag is :err short-circuit and return the variant as the
                 ;; result of the computation
                 :err (reduced variant)
                 :ok (fun variant))))
           init-tup
           fns)))

The endomorph is a monoid. A monoid structure is defined as

(R, *, 1)

where R is a set, * is a binary operation on R and 1 is a nominated element of R, where

(r * s) * t = r * (s * t)    1 * r = r = r * 1

In our case, the set R is the set of functions that operates on functors. And we have defined only one, which is the endomorph function. This function operates on functors (which are in itself endofunctors here). The binary operation is the reduce function and identity is trivial. We can use constantly and identity functions defined in Clojure to do that. The associative law hold because, here the order of composition does not matter because, in whichever way we order the composition, the result is the same object ::turtle-variant. That is from a theory perspective. But from a programmer's perspective, we cannot do arbitrary ordering as we need values within the structure. So conceptually it is associative, but the order of computation matters.

10. Now this is also a monad. The classic definition of a monad follows.

All told a monad in X is just a monoid in the category of endofunctors of X, with product * replaced by composition of endofunctors and unit set by the identity endofunctor
- Saunders Mac Lane in Categories for the Working Mathematician

From the above definition, it is clear that the endomorph is a monad as it operates on endofunctors and itself is a monoid (from above). Now it is also a forgetful functor because some property is discarded once we apply the transformation.

Example usage:

;; Let's say we have many functions similar to fetch, which conforms to the
;; contract defined in the :pre and :post of reduce function of endomorph, we
;; can chain them as below.
(endomorph 
 [fetch parse extract transform save]
 (ok "Init Args" {:url "http://example.com" :xs ".."}))

This proof is based on my reading of An Introduction to Category Theory by Harold Simmons. After all this proof, I came to the realisation about Leibniz's reference about everything is a monad.