I’ve spent a lot of time in the past year learning PureScript and it has drastically changed the way I think about programming in general. The biggest change in my thinking is described by the excellent blog post Parse, don’t validate.
The most important passage in the post, I think, is this:
Consider: what is a parser? Really, a parser is just a function that consumes less-structured input and produces more-structured output.
I think a lot about trying to produce more-structured data at the edges of the systems I’m building. In a strongly-typed language like PureScript, this often means building a parser to take some JSON input and turn it into a custom data type via a parser combinator library. I’m now building systems with Elixir at work and here the concept of “parsing” is more fuzzy. Still, we end up with some untrusted input from the outside world that we need to process. In this post we’ll assume this untrusted input represents JSON in the form of an elixir map produced by a Plug.Parser.
Even though we say that JSON has been “parsed” into a map, the structure of this map is still completely untrusted. We have no idea if the keys we require in the map’s structure are present nor if the values are semantically correct. We’ll cover a simple technique to parse this untrusted map into a trusted struct.
Tools of the Trade
We’ll use two libraries to help us build our parser:
The TypedEctoSchema
library isn’t strictly necessary but I’ve found the
benefits of generating a typespec
for my structs very compelling.
I’m assuming the reader is familiar with Ecto, embedded_schema and Changesets. This post does not aim to teach these building-blocks but rather presents a simple technique for composing them to create parsers.
Let’s build a simple parser that expects to parse a map with a single key called
name
and a string value.
defmodule Parsers.SimplePerson do
use TypedEctoSchema
import Ecto.Changeset
alias Parsers.SimplePerson
@primary_key false
typed_embedded_schema do
field :name, :string, enforce: true
end
def changeset(person, attrs \\ %{}) do
person
|> cast(attrs, [:name])
|> validate_required([:name])
end
def build(attrs) do
struct(SimplePerson)
|> changeset(attrs)
|> apply_action(:build)
end
end
Let’s look at a few examples of using this parser:
iex(1)> Parsers.SimplePerson.build(%{"name" => "Drew"})
{:ok, %Parsers.SimplePerson{name: "Drew"}}
iex(2)> Parsers.SimplePerson.build(%{"name" => "Drew", "unknown" => 1})
{:ok, %Parsers.SimplePerson{name: "Drew"}}
iex(3)> Parsers.SimplePerson.build(%{"foo" => "bar"})
{:error,
#Ecto.Changeset<
action: :build,
changes: %{},
errors: [name: {"can't be blank", [validation: :required]}],
data: #Parsers.SimplePerson<>,
valid?: false
>}
iex(4)> Parsers.SimplePerson.build(%{"name" => 1})
{:error,
#Ecto.Changeset<
action: :build,
changes: %{},
errors: [name: {"is invalid", [type: :string, validation: :cast]}],
data: #Parsers.SimplePerson<>,
valid?: false
>}
Even this very simple example demonstrates the power of this technique. We are
able to ensure our key is present and that it is of the appropriate type. We
also discard any keys that are not specified and ultimately end up with a struct
or an error. We could imagine using this parser in a with
statement where our
system processes external requests.
with(
{:ok, person} <- Parsers.SimplePerson.build(input),
{:ok, result} <- process_person(result)
) do
{:ok, build_response(result)}
else
{:error, error} -> handle_error(error)
end
Note that we build an empty SimplePerson
struct using struct(SimplePerson)
rather than %SimplePerson{}
. This allows us to bypass the protection we added
with the enforce: true
option on our schema. If a user in our system attempts
to create a SimplePerson
directly without providing the name
attribute, they
will be greeted with a compiler failure.
iex(1)> %Parsers.SimplePerson{}
** (ArgumentError) the following keys must also be given when building
struct Parsers.SimplePerson: [:name]
(parsers 0.1.0) expanding struct: Parsers.SimplePerson.__struct__/1
iex:1: (file)
Composing Parsers
We can easily compose parsers using embeds_one and embeds_many.
defmodule Parsers.Address do
use TypedEctoSchema
import Ecto.Changeset
alias Parsers.Address
@primary_key false
typed_embedded_schema do
field :city, :string, enforce: true
field :zip, :string, enforce: true
end
def changeset(person, attrs \\ %{}) do
person
|> cast(attrs, [:city, :zip])
|> validate_required([:city, :zip])
end
def build(attrs) do
struct(Address)
|> changeset(attrs)
|> apply_action(:build)
end
end
defmodule Parsers.Person do
use TypedEctoSchema
import Ecto.Changeset
alias Parsers.Address
alias Parsers.Person
@primary_key false
typed_embedded_schema do
field :name, :string, enforce: true
embeds_one :address, Address, enforce: true
end
def changeset(person, attrs \\ %{}) do
person
|> cast(attrs, [:name])
|> cast_embed(:address)
|> validate_required([:name, :address])
end
def build(attrs) do
struct(Person)
|> changeset(attrs)
|> apply_action(:build)
end
end
iex(1)> Parsers.Person.build(%{
"name" => "Drew",
"address" => %{
"city" => "Chicago",
"zip" => "60606"
}
})
{:ok,
%Parsers.Person{
address: %Parsers.Address{city: "Chicago", zip: "60606"},
name: "Drew"
}}
iex(2)> Parsers.Person.build(%{
"name" => "Drew",
"address" => %{
"city" => "Chicago",
"zip" => 60606
}
})
{:error,
#Ecto.Changeset<
action: :build,
changes: %{
address: #Ecto.Changeset<
action: :insert,
changes: %{city: "Chicago"},
errors: [zip: {"is invalid", [type: :string, validation: :cast]}],
data: #Parsers.Address<>,
valid?: false
>,
name: "Drew"
},
errors: [],
data: #Parsers.Person<>,
valid?: false
>}
Removing Duplication
You’ll note in the last example we have a lot of duplicated boilerplate. The last step of our work is creating a module that extracts the duplication from our parser definitions.
defmodule Parsers.Schema do
@callback changeset(struct(), map()) :: Ecto.Changeset.t()
defmacro __using__(opts) do
quote do
@behaviour Parsers.Schema
use TypedEctoSchema
import Ecto.Changeset
@primary_key false
unquote(add_builder(opts))
end
end
defmacro __before_compile__(_) do
quote do
def build(attrs) do
struct(__MODULE__)
|> changeset(attrs)
|> apply_action(:build)
end
end
end
defp add_builder(opts) do
if Keyword.get(opts, :builder, true) do
quote do
@before_compile Parsers.Schema
end
end
end
end
Using Parser.Schema
we can re-write our previous Person
and Address
parsers.
defmodule Parsers.Address do
use Parsers.Schema, builder: false
typed_embedded_schema do
field :city, :string, enforce: true
field :zip, :string, enforce: true
end
@impl Parsers.Schema
def changeset(person, attrs \\ %{}) do
person
|> cast(attrs, [:city, :zip])
|> validate_required([:city, :zip])
end
end
defmodule Parsers.Person do
use Parsers.Schema
alias Parsers.Address
typed_embedded_schema do
field :name, :string, enforce: true
embeds_one :address, Address, enforce: true
end
@impl Parsers.Schema
def changeset(person, attrs \\ %{}) do
person
|> cast(attrs, [:name])
|> cast_embed(:address)
|> validate_required([:name, :address])
end
end
Our Parsers.Schema
module does several important things.
- Ensures we have defined the
changeset
callback - Tells Ecto not to include a
primary_key
for our embedded schema - Adds a
builder
function unless we explicitly exclude its generation
Parse Your Elixir Input
Using the above simple techniques, we can force our untrusted input into a known-good struct as early as possible in our application. We’ve centralized the logic for parsing our external input and all functions called later can trust that they will receive well-structured input – at least to the extent possible in a dynamically typed programming language.