Some type about Sum types (and more)
2017-03-22 19:00
A computer stores bits, and 8 bits make a byte.
Some bytes are not like other bytes – bytes can mean different things.
Some bytes mean an integer, some bytes mean a list; we say that they have different types.
A type is a group of things
An integer
is a group of integers, like 1, 2, 3, and so on, and so forth.
A boolean
is a group that has 2 members, True and False.
But we’re getting ahead of ourselves, so let’s start from 0.
A void
type has no members. It is void.
A unit
type has only 1 member, and the member is called unit
. You see it as ()
in many places.
What type has 2 members?
Perhaps a boolean
!
True
and False
are both members of boolean
. True
is not False
, and False
is not True
. But they are both boolean
.
We can write it as such
type boolean = True | False
This reads as: “The type boolean has 2 members, it can either be True or False”.
(In OCaml, the type boolean
is actually spelled bool
.)
Sum types
Suppose you’re deciding if you should continue reading.
type decision =
| Decided of boolean
| Undecided
We can make a decision
- do (
Decided True
), or - do not (
Decided False
), or - be
Undecided
(there is no try).
We observe that the type decision
has 3 members.
In the Decided
case, there are 2 members.
The number of members is the same as the number of members in a boolean
, because the Decided
case is made up entirely of boolean
.
In the Undecided
case, there is 1 member, similar to Unit
.
How did we get 3? A decision
can be Decided
(2) or Undecided
(1). 2 + 1 = 3.
See how or and +
works in similar ways.
That’s why they call it a sum type!
Product types
Let’s make a Pair
of boolean
s.
type pair = Pair of boolean * boolean;
How many members are in this type? 4!
True * True
True * False
False * True
False * False
How did we get 4?
For the first boolean, there are 2 members, True and False.
And for each of these 2 members, the second boolean can be True or False.
2 * 2 = 4
And that’s where the name product comes from.
(Notice how the OCaml syntax for defining a product type, or a tuple, is using the *
operator, which in many cases mean product)
Exponential types
You’ve made a decision, and depending on the decision you will either continue reading, or not.
Here’s one particular way to encode this decision-to-action process.
let continue = function
| Decided True -> True
| Decided False -> False
| Undecided -> True
If you have decided, you follow your decision. If you are undecided, you continue reading and see what happens.
continue
is a function, its type is decision -> boolean
.
Let’s ask the same question: how many members are there in the decision -> boolean
type?
A function of this type has to, for each member of the input type, decision
with 3 members, give a member of the output type, boolean
with 2 members.
So the total number of unique functions of this type is 2 * 2 * 2 = 8, or equivalently, 2 ^ 3!
- DT, DF, U -> True
- DT, DF, U -> False
- DT, DF -> True, U -> False
- DT, U -> True, DF -> False
- DF, U -> True, DT -> False
- DT -> True, DF, U -> False
- DF -> True, DT, U -> False
- U -> True, DF, DT -> False
So you see, function types are also called exponential types.
Dependent sum types / Sigma type
A Pair boolean integer
is a type, but a Pair boolean a'
, where a'
depends on what boolean
actually is, True
or False
, is called a dependent sum type.
Here’s an example (not actual valid OCaml code):
type pair = Pair of (b : boolean) * a'
where Pair of True * boolean
Pair of False * integer
It says that that the pair
depends on the first item in the pair, if it is True
, we have a Pair of boolean * boolean
, but if it is False
, we have a Pair of boolean * integer
.
To count how many members there are in this type pair
, we can treat it like a sum type.
The first item in the pair can have 2 possible values
- when
True
, the second item has to be aboolean
, which has 2 possible value. - when
False
, the second item has to be ainteger
, which for simplicity sake, can have 256 values.
Therefore, the total number of members is 2 + 256 = 258.
Why is this called sigma? Because you are taking the sum of all possible types across all values of the type being depended on (first item in the pair).
Dependent product types / Pi type
A decision -> boolean
is a function type, with an argument of type decision
, and a return type boolean
.
But a function decision -> a'
, where a'
depends on what decision
actually is, True
or False
, is called a dependent product type.
type continue = decision -> a'
where continue Decided True : integer
continue Decided False : boolean
continue Undecided : boolean
This says that when the type of continue
depends on the value of the argument.
When decision
is Decided True
, continue
returns an integer, when Decided False
or Undecided
, it returns a boolean.
Let’s count how many members type has. Recall how we counted the number of members a function type has previously, we multiplied the number of members in the output type n times, where n is the number of member in the input type.
In this case, since the number of members in the output type changes, we have to multiply them one by one instead of taking an exponent, to get 256 * 2 * 2 = 1024.
Why is this called pi? Because you are taking the product of all possible types across all values of the type being depended on (argument of the function).
And actually, the function space is a special case of the pi type, where the return type is constant, i.e. does not depend on the value of the argument.
A little theory behind this
We can think of types as a set of things. We say that some value is of that type when it is a member in that set.
For example, boolean
is the set of things, which contains True
and False
.
A sum type is a disjoint union, or tagged union.
A product type is a cartesian product.
A sigma is the symbol to do a sum over a sequence.
A pi is the symbol to do a product over a sequence.
A lot of this comes from, or is related, to category theory.
Notes
I chose to say that a type has 2 members to invoke the idea of thinking about types as sets. Other words that can be used include value (rejected because it means too many things).
The real reason why the words sum
, product
, etc are used probably has less to do with counting members, and more to do with category theory. But it works well enough if you are trying to get an intuition about the meaning.
References
I found a lot of relevant and helpful resources while researching on sum types:
- Manis Goregaokar. What Are Sum, Product, and Pi Types?
- Chris Taylor. The Algebra of Algebraic Data Types, Part 1 (he goes into recursive types in the next post, check it out!)
- Wikipedia page on Dependent Type
- Idris, which has dependent types, for tips on the made up syntax.
- Intuitionistic type theory