Books / Applied Functional Programming Tutorial / Chapter 3
Functions and basic data types
In this chapter
General summary of syntax
Haskell keywords
There are some reserved keywords/operators which have some special meaning in Haskell:
- operator-like:
!
,'
,"
,-
,--
,-<
,-<<
,->
,::
,;
,<-
,,
,=
,=>
,>
,?
,#
,*
,@
,\
,_
,`
,|
,~
- brackets-like:
[| |]
,{ }
,{- -}
- keywords (common):
as
,case of
,class
,data
,deriving
,do
,hiding
,if then else
,import
,infix infixr infixl
,instance
,let in
,module
,newtype
,qualified
,type
,where
- keywords (not so common):
data family
,data instance
,default
,deriving instance
,forall
,foreign
,mdo
,proc
,rec
,type family
,type instance
Type signature
As was in the previous lesson, you can specify a type of an expression directly or by its name via ::
operator-like keyword “has-type”. It is mandatory only in case of ambiguity because Haskell’s type system has type inference (it can analyze what type the expression should have). Type of expression cannot change - it is static typing.
a :: Integer
b :: String
a = 42
b = a
c = (a + 7) :: Double
Haskell does not have implicit conversion and if you do something like it is in the example above, you will get an error during compilation. The type system is very strict and helps programmers to find more bugs during compile time and not during runtime (where is more problematic to find it).
Couldn`t match type ‘Integer’ with ‘[Char]’
Expected type: String
Actual type: Integer
In the expression: a
In an equation for ‘b’: b = a
Type variable
As you could have noticed, we are able to achieve polymorphism with something strange in function types, something called type variable. Type variables must start with a lowercase letter and usually are just 1 character from the beginning of the alphabet:
identity :: a -> a
identity x = x
Such type signature tells us that identity
is a function from whatever type a
to the same type a
.
Type constraints
In some cases (function type and others) you can see typeclass constraints like here:
next :: Num a => a -> a
next x = x + 1
It says next
has a type a
to a
where a
is “from” typeclass Num
(typeclasses will be covered later, for now, it is just class or types). Such type signature can be even more complex.
foo :: (Show a, Eq a, Read b) => a -> b -> a
Function declaration
Although type can be in most cases inferred automatically by the compiler, it is a good practice to write it down at least in case of functions as a part of code documentation. Functions can be complicated and by reading its type signature you know immediately what arguments it expects and what it returns.
Declaration of a function is simple, just use the prefix notation as you would call the function and then describe what is it equal to.
fibonacci :: Word -> Integer
fibonacci 0 = 1
fibonacci 1 = 1
fibonacci n = fibonacci (n-1) + fibonacci (n-2)
Type declaration
There are three basic ways how to introduce your own data type:
type
synonym = you just use a different name for some existing type (for exampleString
is type synonym for[Char]
= list ofChar
)- new
data
type = declare type with type constructor (before=
) and one or more data constructors (after=
, separated by|
), you may use typeclass constraints, type variables, and recursion newtype
= new data type with exactly one data constructor with one parameter (new type is isomorphic with the “wrapped” type and compiler can do optimizations, can be used also in another way in more advanced code)
type String = [Char]
data Bool = True | False
data Tree a = Leaf a | Branch (Tree a) (Tree a)
deriving (Show, Read)
newtype Age = Age Int
myTree :: Tree Int
myTree = Branch (Leaf -7) (Branch (Leaf 7) (Leaf 10))
myAge :: Age
myAge = Age 20
Here, type String
can be replaced by [Char]
(it is a mere synonym), so typechecking is not ‘perfect’ in this sense. In case of newtype
, Age
is a different type than Int and compiler will check it.
The keyword deriving
allows you to automatically make your data type instance of some typeclass (limited set of built-in classes) to allow some common Haskell mechanisms and functions to work with your types:
Eq
- equality operators==
and/=
Ord
- comparison operators<
,<=
,>
,>=
;min
,max
, andcompare
(subclass ofEq
)Show
-show
for value toString
conversion (+ other related functions)Read
-read
forString
to value conversion (+ other related functions)Enum
- for enumerations, allows the use of..
range list syntax such as[Blue .. Green]
Bounded
- for enumerations or other bounded,minBound
andmaxBound
as the lowest and highest values that the type can take
As it was said, typeclasses are a very important means of abstraction in Haskell and will be covered later on. You will also learn how to make new typeclasses, their instances, etc.
Data types
Haskell has a strong static type system which is one of the things making it so great. As we already saw, every expression in Haskell has some type and the type cannot change during runtime (that is the difference with dynamic typing). As in other programming languages, you can use predefined data types, get more from some libraries or introduce your own.
Basic Data Types
Int
= A fixed-precision integer type with at least the range[-2^29 .. 2^29-1]
(exact range can vary based on implementation, it can be check withminBound
andmaxBound
)Integer
= Arbitrary-precision integers (e.g. theoretically unlimited)Float
= Single-precision floating point numbersDouble
= Double-precision floating point numbersWord
= Unsigned integral number (same size asInt
)Char
= Unicode character (ISO/IEC 10646)Bool
= truth value, onlyTrue
orFalse
String
= literally list of characters (type synonym for[Char]
)
Type and data constructor
Get back to creating own data types with the data
keyword. After data
is the type construction starting with a capital letter and then there might be type parameters (type variables). Then =
follows and after it, we can specify multiple data constructors separated by |
. Each data constructor again starts with a capital letter and can be followed by data types which it takes as arguments.
data MyType a b = MyTypeC1 a Int | MyTypeC2 String b | MyType3
deriving Show
x :: MyType Bool Float
x = MyTypeC1 True 10
Usually, when there is just one data constructor, the name is the same as of the type constructor (but it is not a rule).
This is one another great Haskell feature: Algebraic Data Types (ADT). These are now found in more and more new languages (e.g. Rust and very fresh Reason from Facebook).
Record types
Imagine you want to create a type for Person
which contains name
, age
, gender
, and city
. You would need to do:
data Gender = Male | Female
deriving Show
data Person = Person String Int Gender String
deriving Show
name :: Person -> String
name (Person x _ _ _) = x
age :: Person -> Int
age (Person _ x _ _) = x
gender :: Person -> Gender
gender (Person _ _ x _) = x
city :: Person -> String
city (Person _ _ _ x) = x
Not very nice, right?! And we have just 4 attributes of a person. Luckily there is syntactic sugar which makes it easier for us called record:
data Gender = Male | Female
deriving Show
data Person = Person
{ name :: String
, age :: Int
, gender :: Gender
, city :: String
} deriving Show
Now try to create a type Pet
which also contains name
and age
. You will get an error which is logical, you cannot have two functions with the same name! One option is to rename it to namePerson
and namePet
, the second is available to you only if you have GHC 8.0.1 or higher and it is with language extension DuplicateRecordFields (however the first option is more common):
{-# LANGUAGE DuplicateRecordFields #-}
{-
Seven Deadly Sins ordered by Dante Alighieri
see: https://simple.wikipedia.org/wiki/Seven_deadly_sins
-}
data Sin = Lust | Gluttony | Greed | Sloth | Wrath | Envy | Pride
deriving (Show, Read, Eq, Ord, Enum, Bounded)
data Gender = Male | Female
deriving (Show, Read, Eq) -- Why not Ord? Gender equality!
data Person = Person
{ name :: String
, age :: Int
, gender :: Gender
, city :: String
} deriving (Show, Read, Eq)
data Pet = Pet
{ name :: String
, age :: Int
} deriving (Show, Read)
olderPet :: Pet -> Pet
olderPet pet = pet { age = (age pet + 1) }
You can also see that there is a shorthand for updating the value of record - creating new edited record from previous. Now you can try some derived behaviour from typeclasses such as show
, read
, or ==
:
*Main> show Male
"Male"
*Main> read "Male"
*** Exception: Prelude.read: no parse
*Main> (read "Male") :: Gender
Male
*Main> Male == Female
False
*Main> Male == Male
True
*Main> Male < Female
<interactive>:8:1: error:
• No instance for (Ord Gender) arising from a use of ‘<’
• In the expression: Male < Female
In an equation for ‘it’: it = Male < Female
*Main> Gluttony < Wrath
True
*Main> [Gluttony .. Envy]
[Gluttony,Greed,Sloth,Wrath,Envy]
*Main> :t maxBound
maxBound :: Bounded a => a
*Main> maxBound :: Sin
Pride
*Main> minBound :: Sin
Lust
*Main> let p1 = Person { name = "Marek", age = 25, gender = Male, city = "Prague" }
*Main> let p2 = Person { name = "Marek", age = 25, gender = Male, city = "Prague" }
*Main> p1 == p2
True
*Main> let p3 = Person { name = "Marek", age = 26, gender = Male, city = "Prague" }
*Main> p1 == p3
False
*Main> show p1
"Person {name = \"Marek\", age = 25, gender = Male, city = \"Prague\"}"
This is one of the weaker parts of Haskell: it actually does NOT have records (as explained, there is just a syntactic sugar upon tuples). Newer languages from the Haskell family PureScript and Elm elaborated on this and introduced proper records.
Algebraic Data Types (ADTs)
We say that our datatypes are algebraic in Haskell because it allows us to create sum and product types (with data
), type aliases (with type
), and special types (with newtype
).
- Sum = alternation with
|
(A | B
-> A + B)
data Number = I Int | D Double -- Int + Double
- Product = combination of types (
A B
-> A * B)
data Pair = P Int Double -- Int * Double
List and tuple
There are two basic container types (has the ability to store multiple values) - tuples and lists. Of course, there are much more such as maps, sets, vectors, streams defined in some libraries or you can create your own but these are really the basic and widely used.
Tuple
Tuple has a fixed number of elements of fixed types. You always know how many elements are in the tuple and what type is it. Type of elements in a tuple can differ.
myTuple :: (Int, String, Bool, Double)
myTuple = (15, "String", True, 5.24)
myFunc :: (Int, String, Bool, Double) -> (Double, String)
myFunc (a, b, c, d) = (if d then a + d else a - d, b)
There are basic functions for tuples with two elements: fst
, snd
, and swap
.
Prelude> :type fst
fst :: (a, b) -> a
Prelude> fst (7, "Hello")
7
Prelude> :type snd
snd :: (a, b) -> b
Prelude> snd (7, "Hello")
"Hello"
Prelude> import Data.Tuple
Prelude Data.Tuple> :t swap
swap :: (a, b) -> (b, a)
Prelude Data.Tuple> swap (7, "Hello")
("Hello",7)
Good to know is, how it actually works and try to implement own tuples.
data MyTuple2 a b = XTuple2 a b
data MyTuple3 a b c = XTuple3 a b c
-- ...
myTuple :: MyTuple3 Int String Double
myTuple = XTuple3 7 "Hi" 2.25
What forms the tuple is the ,
operator keyword and used notation in the first example is just a syntactic sugar.
myTuple = (,) 7 "Hi"
List
List is different than tuples - it has variable length (because it is a recursive type) and its elements have the same type. To understand it, let’s create an alternative implementation of list data type.
data List a = Empty | NonEmpty a (List a)
That’s it! List of type a
is either Empty
or NonEmpty
which means that it has an element and then rest of the list (which can be again Empty
or NonEmpty
). Sometimes the following naming is used:
data List a = Nil | Cons a (List a)
It is because with Cons
you join element with the other list. List in Haskell is very similar, just for Nil
you use empty list []
and for joining the infix cons operator :
.
myIntList :: [Int]
myIntList = [5,8,7,1]
myIntList = 5:8:7:[1]
myIntList = 5:8:[7,1]
-- ...
-- infix cons
myIntList = 5:8:7:6:1:[]
-- prefix cons
myIntList = (:) 5 ((:) 8 ((:) 7 ((:) 1 [])))
Actually [5,8,7,6,1]
is a syntactic sugar for 5:8:7:6:1:[]
and even for the prefix. Same goes for the type, when you write [Int]
it actually means [] Int
. You can rewrite the actual list to:
-- data [a] = [] | a:[a]
data [] a = [] | (:) a ([] a)
String
String is really nothing but just a list of characters [Char]
. The only difference is that there are more functions for working especially with String
s - like putStr
, lines
, words
and more (see Data.String). For more efficient working with strings is text package providing “a time and space-efficient implementation of Unicode text” with Data.Text - two variants: Lazy and Strict. Later on, we will get back to this problem which makes a life of Haskell programmer sometimes little bit uneasy.
Simple functions
Enough of types and containers, let’s do some functions when this is a functional programming course!
Basic list functions
Since list is a very simple and widely used data structure, it is a good time to learn useful functions to work with lists. You can find a complete list in Data.List documentation. Try following examples and examine the type of functions if needed. Also, try to run some unclear cases like head
of the empty list and see what happens…
Prelude> let myList = [2,4,5,3,2,8,4,1]
Prelude> head myList
2
Prelude> tail myList
[4,5,3,2,8,4,1]
Prelude> myList ++ [4, 5]
[2,4,5,3,2,8,4,1,4,5]
Prelude> myList !! 2
5
Prelude> null myList
False
Prelude> null []
True
Prelude> length myList
8
Prelude> reverse myList
[1,4,8,2,3,5,4,2]
Prelude> take 2 myList
[2,4]
Prelude> drop 2 myList
[5,3,2,8,4,1]
Prelude> filter (<6) myList
[2,4,5,3,2,4,1]
Prelude> takeWhile (<6) myList
[2,4,5,3,2]
Prelude> dropWhile (<6) myList
[8,4,1]
Prelude> elem 5 myList
True
Prelude> elem 7 myList
False
Prelude> zip [1,2,3] [4,5,6]
[(1,4),(2,5),(3,6)]
Prelude> map (^2) myList
[4,16,25,9,4,64,16,1]
Prelude> all (<6) myList
False
Prelude> any (<6) myList
True
Prelude> sum myList
29
Prelude> or [True, False, True]
True
Prelude> and [True, False, True]
False
Prelude> foldl (+) 0 myList
29
Prelude> foldl (||) False [True, False, True]
True
Prelude> foldl (&&) False [True, False, True]
False
The last one is left fold, there is also right fold (depends on associativity), we will cover this in more detail later, while explaining so-called catamorphism. Now you can just see that it is a generalization of sum
, and
, or
, and many others.
It is a very good practice to try to implement some of these functions to understand them and their complexity. You may worry that using list is always very inefficient, luckily GHC can do some optimizations (although still in some cases you should prefer Data.Sequence or other containers - we will get back to this during the course).
Intro to pattern matching
An important concept in many (not just) functional programming languages is the pattern matching. You could already notice it before in the example with record data types. When defining a function, it is possible to match the parameters via data constructors and/or values. As we’ve shown, for lists and tuples there are data constructors (:
and ,
) which can be used in pattern matching as well.
data Age = Age Int | Unknown
ageInfo :: Age -> String
ageInfo (Age x) = "Age is " ++ (show x) ++ " years."
ageInfo Unknown = "Age is unknown"
head :: [a] -> a
head (x:xs) = x
tail :: [a] -> a
tail (x:xs) = xs
length :: [a] -> Integer
length [] = 0
length (_:xs) = 1 + length xs -- _ wildcard = don't care & won't use
fst :: (a, b) -> a
fst (x, _) = x
snd :: (a, b) -> b
snd (_, y) = y
There are three more advanced, not so common, but sometimes useful concepts: pattern naming, lazy pattern, and strict pattern. We will get back to them in the next lesson when we will cover also guards.
Recursion and tail recursion
The concept of recursion is well-known - a function that has the ability to invoke itself. That allows us to solve big problems by recursively solving their smaller part(s). The best-known example is factorial - the trivial case is the factorial of 0, which is 1. Any other factorial of natural number n is then n times factorial of n-1. In Haskell we can write exactly this definition:
factorial 0 = 1
factorial n = n * factorial n-1
During any call of subroutine (function, procedure, or other action), it is needed to store information to the call stack. Such information consists of where was the call initiated, what was the state and where it should return the value when poping from this stack. For example, with calling res = factorial 3
call stack could look like this (top on the left):
res = _
factorial 3 = 3 * _
,res = _
factorial 2 = 2 * _
,factorial 3 = 3 * _
,res = _
factorial 1 = 1 * _
,factorial 2 = 2 * _
,factorial 3 = 3 * _
,res = _
Now it reaches factorial 0 = 1
and can start popping back the result:
factorial 1 = 1 * 1
,factorial 2 = 2 * _
,factorial 3 = 3 * _
,res = _
factorial 2 = 2 * 1
,factorial 3 = 3 * _
,res = _
factorial 3 = 3 * 2
,res = _
res = 6
The result is indeed 6, but could it be more efficient? Why is it necessary to use call stack? It stores the context of interrupted functions by the recursive call, it must remember that result needs to be multiplied then and after that, it can be returned. What if there is nothing more to do after returning the value from the recursive call - nothing needed to remember? That is called tail recursion and in such case, it can optimize usage of call stack - only 1 frame will be (re)used!
Following factorial
is tail recursive with use of so-called accumulator acc
, the result is returned from the trivial case without any change.
factorial n = fac' n 1
where fac' 0 acc = acc
fac' x acc = fac' (x - 1) (x * acc)
factorial 3
fac' 3 1
fac' 2 3
fac' 1 6
fac' 0 6
6
Although Haskell’s lazy evaluation strategy and GHC optimizations make it unnecessary to write tail-recursive functions, you should be familiar with the concept as functional programmer. With Haskell, you should more focus on the readability of your code and productivity!