In the past two years I’ve become reasonably comfortable with both PureScript and Haskell. I’ve learned so many new things while diving into the pure functional programming ecosystem and many of these techniques can be applied to other paradigms. Unfortunately, the pure FP world can feel a bit like another dimension – where many programming problems have elegant solutions but the world of “regular” programming isn’t aware of these patterns.
One such pattern is called “applicative-style validation”, but I’ll simply call it “declarative validation”. In this post I’ll provide some motivation for using this technique and then build a small library in Python implementing these ideas.
Motivation
Many of our programs accept input from the user. Often we need to validate this input before continuing processing and, in the case of errors, inform the user of any problems. There are several techniques for performing this kind of validation, but the most common is to write some imperative code that walks over the input and builds up a list of errors. If there are no errors, the provided input is valid, otherwise it isn’t. We can wrap the result our validation returns in an object that indicates if the validation was successful.
@dataclass
class Valid:
value: Any
def is_valid():
return True
@dataclass
class Invalid:
value: Any
def is_valid():
return False
def validate_name(name, errors):
if not isinstance(name, str) or name == "":
errors.append("name must be a non-empty string")
def validate_age(age, errors):
if not isinstance(age, int):
errors.append("age must be an int")
elif age < 10:
errors.append("age must be at least 10")
def validate(data):
errors = []
validate_name(data.get("name"), errors)
validate_age(data.get("age"), errors)
if not errors:
return Valid(data)
else:
return Invalid(errors)
While this approach works, things get complicated when we have validations that
are dependent on previous results. Say, for example, that we want to add a new,
more complicated rule stating that if the name is Drew
, the age must be at
least 40
. In order to do this, both name and age need to be present and have
the appropriate type. But we don’t have a convenient way to “reuse” this logic
from the existing validate_name
and validate_age
functions. One approach is
to simply re-check locally and assume errors have already been added if the
types are incorrect.
def validate_drew(data, errors):
if (not isinstance(data.get("name"), str) or
not isinstance(data.get("age"), int)):
return
elif data.get("name") == "Drew" and data.get("age") < 40:
errors.append("Drew must be old")
This isn’t great, because now we’ve duplicated the instance checks in two
places. We could also make sure specific errors are not present in the errors
list, but this would couple this validation to the errors exposed in a previous
validations.
The downsides of the stateful validation approach can be overcome by using a “parsing” approach. That is, we declaratively describe the shape and type of the input that we expect and return an error if our data does not meet those expectations. This approach is extremely well documented in the post Parse, don’t validate. Parsing is a fantastic alternative to stateful validation, but this style of parsing (often called monadic parsing) does have one disadvantage – it halts processing as soon as the first error is reached. We’d like to collect as much information as possible on the invalid input for our user.
We can take another approach that gives us the composability of the parsing approach as well as the error accumulation of the stateful approach. This approach is traditionally called “applicative-style validation”.
Building Blocks
We’ll be providing two primary functions along with our existing Valid
and
Invalid
types.
validate_into
allows us to call a provided function with a list of arguments, assuming all the arguments areValid
. Otherwise, it accumulates the errors in anyInvalid
arguments.and_then
allows us to perform another “stage” of validations assuming the subject of the function isValid
. If the subject of the function isInvalid
, we do nothing.
You can think of validate_into
as building one “stage” of our validation
pipeline and and_then
as linking two stages together. Any validations within a
stage will have their errors accumulated, but if a stage fails, we won’t run
validations for any later stages. This means we should only break our
validations into stages when a given stage depends on valid values from a
previous stage.
Let’s use these two functions to reimplement our validations from above. First,
we’ll define a Person
class into which we’ll be placing the valid data.
@dataclass
class Person:
name: str
age: int
Now, we’ll re-define our validate
function and its helpers.
def validate_name(name):
if not isinstance(name, str) or name == "":
return Invalid(["name must be a non-empty string"])
else:
return Valid(name)
def validate_age(age):
if not isinstance(age, int):
return Invalid(["age must be an integer"])
elif age < 10:
return Invalid(["age must be at least 10"])
else:
return Valid(age)
def validate_drew(person):
if person.name == "Drew" and person.age < 40:
return Invalid(["Drew is old"])
else:
return Valid(person)
def validate(data):
return validate_into(
Person,
validate_name(data.get("name")),
validate_age(data.get("age")),
).and_then(validate_drew)
There are a few things to notice here. First, each validation function stands
alone. Second, there is no mutation of the input data happening. Each function
performs its validations and then returns a Valid
or Invalid
value. Last,
note that each Invalid
returns a list
of errors. This allows our
accumulation to happen.
Let’s call our validate
function a few times and see what happens:
validate({
"name": None,
"age": "hello",
})
# => Invalid(value=[
# 'name must be a non-empty string',
# 'age must be an integer'])
validate({
"name": "Drew",
"age": 38,
})
# => Invalid(value=['Drew is old'])
validate({
"name": "Jane",
"age": 38,
})
# => Valid(value=Person(name='Jane', age=38))
Notice that the second stage of our validations, namely validate_drew
, can
assume all of its input is Valid
after the first stage. Therefore, we don’t
need to re-check anything regarding the types of name
or age
before
performing our specific validation (Drew needs to be old). Also notice how easy
it would be to add new validations if we added a new argument to the Person
constructor.
Implementing the Library
We might image that the library to support this code would be quite complicated.
In practice, it is very simple. The only function outside of the standard
library we use is curry
from the toolz
library, but if we wanted to drop
this dependency we could re-implement curry
ourselves.
from dataclasses import dataclass
from functools import reduce
from toolz import curry
from typing import Any
@dataclass
class Valid:
value: Any
def is_valid(self):
return True
def apply(self, other):
if other.is_valid():
return Valid(self.value(other.value))
else:
return other
def and_then(self, f):
return f(self.value)
@dataclass
class Invalid:
value: Any
def is_valid(self):
return False
def apply(self, other):
if other.is_valid():
return self
else:
return Invalid(self.value + other.value)
def and_then(self, f):
return self
def validate_into(f, *args):
return reduce(lambda a, b: a.apply(b), args, Valid(curry(f)))
The above code is the entirety of our library. The and_then
function is
relatively straight forward. If we attempt to chain a new stage of validations
on a Valid
value, we simple invoke the provided function with the value
inside of our Valid
. If we attempt to chain a new stage of validations on an
Invalid
value, we just ignore the provided function and return self
.
The validate_into
function feels more complicated, so let’s describe what it
is doing step by step. First, we curry
the provided function. This is
important because we’re going to be applying the function one argument at a time
as we determine if each argument is Valid
. We also place this curried function
into a Valid
wrapper as it starts in a Valid
state before seeing any
arguments. Then, one by one, we apply the next argument to our “function so
far”. In the case that the argument is Valid
and the “function so far” is
valid, we just invoke the function with the argument and re-wrap it in Valid
.
If the “function so far” is Valid
but the new argument is Invalid
, we make
the new “function so far” the Invalid
result. Finally, and importantly, if the
“function so far” is already Invalid
and we’re provided a new Invalid
argument, we concatenate the errors and re-wrap the result in Invalid
.
Using these simple tools, we can write complicated, deeply-nested validators.
Reuse of our validators is simple as they are nothing more than functions. We
can place them in a package and share commonly used validators (think
validate_presence
) easily across our codebase.
Prior Art
Nothing in this post is new. I am reimplementing ideas from many other ecosystems in Python to make them more approachable. Applicative-style validation is just one of a huge number of ideas from the pure functional programming world that deserves wider recognition and adoption in more mainstream languages.