Hello again! Welcome to the second dev log for the generative art environment project I’m working on (I still haven’t thought of a good name). As “promised”, this post will be more technical, but after hearing from a non-developer friend that they enjoyed reading my previous post, I’ll try to keep the posts somewhat approachable (but forgive me, since at least this and the next two posts will be more technical and less feature focused). If you somehow stumbled upon this post and have some feedback and don’t know me in person, feel free to send me an email (it’s on my about page).

In the past few weeks I’ve both made a lot of progress and less than I had anticipated. The main feature that I’m focusing on right now is allowing for people to add multiple sketches and see them on the sidebar, and as I’ve worked on it I’ve discovered a lot of things that I either didn’t realize I would want or need, like using SQLite in WASM with OPFS (I had originally planned to use IndexedDB) or didn’t anticipate the difficulty of implementing (like adding the rough version of an event sourcing system that I call Operations), so while I’ve made very little progress on the actual feature I’m still making good progress on the foundations that will allow this feature and others to be built and still hope that by the end of the month that I’ll have that feature working at least on the client side (and hopefully in the next post I’ll have more details about it!).

In any case, progress has mainly been in three areas:

Using schemas and malli, a schema library, much more substantially, in the API (both server and client), database results, and in a couple other areas.
Adding SQLite in a web worker, with a similar setup to what I have on the backend, in particular HugSQL and migration files for the database.
Adding an “event-sourcing”-style system that I dub Operations, which among other things lets the application work mostly offline (although not totally local first).

Each of these topics has a lot of detail, so I’ll use this post to talk about the first, using malli and the schema driven approach in general.

#Schema driven architecture

I’m a huge proponent of schema driven architectures. At Pitch, I got a little familiar with malli (we started using it towards the end of my tenure there), but I really fell in love with the schema driven approach at my last job, where I used Zod heavily on the client and the backend, and it really simplified dealing with data validation and transformation (with the added benefit of type inference too, which is extremely powerful). It was used for all of the API contracts, and since the schema for the requests and responses are defined statically you get a Swagger file basically for free, which was useful during development. It was used to define the data shape and transformations for incoming Pusher notifications and for form data validation. It was basically at the core of the Operations system I made (which serves as the basis of the system I built for this project). Having dealt with various data related problems from the perspective of schemas, I came to realize that explicitly defining and using schemas as the way you interact with data is such a boon that I can’t recommend it enough; I truly believe that putting data schemas at the core of your architecture and essentially using it as one of your application’s “first principles” is one of the best decisions you can make for the use of data as it flows through your app. There are few technical things I have this kind of belief in.

Even if you don’t hold this conviction, I think it’s uncontroversial to say that this problem is important and should be handled with care. Take HTTP APIs. The data is inherently untrusted (no matter if it’s being public, private, or internal, someone could send you a jumbled mess of characters rather than JSON), the endpoints may have configurable request and response shapes using the Content-Type and Accept headers, and may even be documented and under SLA, so the requests and responses have to follow certain guidelines. These things require you to reason about the API as an interface for data that needs to be transformed and then validated. The same can be said for SQL queries. The execution of a query often includes parameters, which need to be transformed into a certain shape (for example, in SQLite a timestamp column is really just a text or real type column, formatted a certain way), and the result is the same, returning values that may or may not be in the way your application can easily work with them (using timestamp again, you’ll need to transform that incoming value to a time container). You can use an ORM to handle some of this stuff, but even the best ORM isn’t going to really handle everything, and can’t tell you your query missed selecting a column, and you’ll eventually need to rely on some form of a schema.

I could go on, but you get the idea: many problems we encounter deal with parsing and transforming data, and I posit that it’s such an important consideration that how you deal with it should be a core decision you’ve made in your architecture. If you don’t, you’re left with needing to encode the rules of transformation in other, more bespoke ways, which different folks will treat differently, and which will eventually cause bugs and frustration. Akin to choosing your rendering library, choosing a data transformation library is essential.

Note for my non-technical friends: the rest of the post is much more technical, so don’t feel obligated to stay in the Zoom call, so to speak. You can skip to the very end if you’d like.

#Can I find magic?

So, for my app schemas, I’m using the same library I used at Pitch, malli. A quick intro for non-Clojure folks: malli is a library that lets you define schemas for your data in an easy to understand way. If you’re coming from TypeScript, it’s similar to Zod in a lot of ways. The syntax is straightforward to work with and understand:

(def Address
  [:map
   [:id :string]
   [:tags [:set :keyword]]
   [:address
    [:map
     [:street :string]
     [:city :string]
     [:zip :int]
     [:lonlat [:tuple :double :double]]]]])

The keywords like :map and :string are what malli calls types, and in general they correspond to the underlying data structure. In most cases you’d something more complex than a single type, and for that all you need to do is define it and then pass it around the same as you do a type:

(def User
  [:map
   [:id :string]
   [:name :string]
   [:email :string]
          ;; ꜜ Note the use of Address here
   [:address Address]

In other words, you can embed types and schemas together and it “just works”. Suffice it to say, it’s really powerful (dare I say magical?!) and a very well designed library.

#It’s not perfect

Okay, maybe it’s not magical; as good as it is, malli is unfortunately not a perfect library and is not without its faults. When I was adding the API schemas, one thing I wanted was to be able to have a :base64 type, which would also automatically transform base64 encoded strings to bytes and vice versa, and doing this was much trickier than I had expected, in part because there is essentially no documentation on creating a custom transformer, but also in part because finding out that there was no documentation was not straightforward, since almost all of the docs live on the readme. Don’t get me wrong, the documentation is well done, and I appreciate the fact that it’s littered with examples, rather than just API surface docs (here’s looking at you, Clojure), but it also goes to show that simply making documentation available isn’t sufficient, even with good examples.

Anyway, after a lot of diving into code, research, and headscratching, I eventually found one Slack thread and one forum post that helped me far enough to get something working. In case you have the same need I do, you basically need to implement a few things.

First, you need to create the schema you want. There are a few ways to do it, but in a lot of cases you can use -simple-schema, which is a function that takes a map which has keys such as type and pred:

(def my-schema (malli/-simple-schema {:type :my-schema :pred valid-value?}))

There are other arguments you can provide depending on your use case (you might find :type-properties interesting), and if you need a type that needs more complex logic, you can reify IntoSchema and implement bespoke parsers, validators, and so on. In my application I’ve only needed to use -simple-schema so far.

If you only need a type that can be transformed automatically to and from strings, that’s all you need to do, but in a lot of cases, like base64, you’ll need to implement some transformation logic too. If you need to do that, you’ll need to create a separate transformer:

(def my-transformer
  (malli.transform/transformer {:name :my-schema
                                :default-decoder {:compile decoder}
                                :default-encoder {:compile encoder}}))

The functions are “compiled”, as malli calls it — they are higher order functions that take schema and options as arguments, returning a function that takes the value, which returns the transformed value, returning the value if it doesn’t handle the transformation or there’s an exception:

(defn my-decoder
  [schema options]
  (fn [value]
    (if-not (= :my-schema (malli/type schema))
      value
      (try
        (do-something-with value)
        (catch Exception _
          value)))))

Also keep in mind that the functions are called for every value, so you need to guard against things you don’t want to transform in the inner function. This is why there’s the if-not condition above.

As a concrete example, here’s roughly how I do the :base64 decoder:

(defn base64-decoder
  [schema options]
  (fn [value]
    (if-not (= :base64 (malli/type schema))
      value
      (try
        (base64/decode value)
        (catch Exception e
          (log/error! e)
          value)))))

A couple things to point out here:

As mentioned, you need to guard against the fact that the functions are called for every value, not just the ones you want it to, so you have to check the type in a conditional in the transformer encoder/decoder functions. I kind of understand why it’s done this way, but I wish that these checks were performed at a higher level.
Your transformer functions shouldn’t throw, and instead should return the original value if there’s an exception. This is an important thing to keep in mind, because it’s what allows the system to tell you when, for example, multiple values do not follow a nested schema, and the library is designed such that it will just raise an exception if your transformer raises an exception and you’ll lose all that other information. If you need to be able to validate something that might look similar to soemthing else, like the result of parsing JSON, you can usually add metadata using vary-meta to the result. In the case that you can’t, like with a string, you’d need to find another way to do it (i.e. make a wrapper for the String class with some additional metadata).

Okay, transformers, check.

Lastly, you need to put the schema into a registry, and get the transformers when needed, in that order. I highly recommend using a lazy registry:

(malli.registry/lazy-registry
  (malli/default-schemas)
  (fn [type registry]
    (cond
      (= :base64 type)
      base64-schema)))

You can set this as the default registry using malli.registry/set-default-registry!. You should make sure you set your schema registry as early in your application lifecycle as possible, so that they’re available before something like your API router is initialized.

With the transformers you can just conj them to string-transformer in a variable:

(def transformers
  (let [transformers-with-defaults (conj
                                    [base64-transformer]
                                    malli.transform/string-transformer)]
    (apply
     malli.transform/transformer
     transformers-with-defaults)))

You can then pass transformers instead of the default. Here’s how I use them for my :coercion options in my reitit router:

(ring/router
  routes
  {:data {:coercion (coercion.malli/create (-> coercion.malli/default-options
                                               (assoc-in
                                                [:transformers :body :default]
                                                transformers)
                                               (assoc-in
                                                [:transformers :response :default]
                                                transformers)))}})

An important note: I mention earlier that you should define the schemas as early as possible in your app lifecyle. It’s not only to make sure that the schemas are available, but also that the transformers depend on a schema being available; trying to set a transformer before the schema that it’s associated with is defined will result in an error. So if you run into some weird errors related to your transformers when instantiating them, double check that the schemas are properly defined.

That’s it for this devlog! Thanks for reading. In the next one I plan to write up about using SQLite in the frontend and how I deal with querying and reactivity when updating the data, so stay tuned. Hopefully it won’t be a month until the next one like it was for this one.◾️