이 책에 대하여
Note: 이 책은 출간한지 얼마 되지 않았습니다. 만약 오류를 발견한다면 알려주세요 (한국어 맞춤법 및 번역 오류는 여기로)
이 책에서 저희는 하스켈을 이용하여 저희만의 커스텀 마크업 언어로 작성한 문서를 HTML로 바꿔주는 간단한 정적 블로그 생성기를 만들 것입니다.
저희는:
- 작은 HTML 출력 라이브러리를 구현
- 저희만의 커스텀 마크업 언어를 정의하고 파싱
- 파일을 읽어들이고 여러개의 모듈을 하나로 만들기
- 터미널에서 인수 구하기
- 테스트 코드와 문서 작성
챕터별로 우리는 달성하고자 하는 특정한 작업에 집중할 것입니다, 그리고 각 장을 통해 우리는 각 작업을 끝내기에 충분한 양의 하스켈을 배울 것입니다.
왜 이 책을 읽어야하나요?
이미 많은 하스켈 튜토리얼, 가이드, 책이 있는데 왜 이 책을 읽어야하나요?
장점
분명 더 많은 장점이 있겠지만, 여기 몇 가지 장점들이 있습니다
- 이 책은 상대적으로 짧습니다 - 많은 하스켈 책들이 몇백 장이 넘어가는 페이지를 가지고 있습니다. 이 책은 (PDF로 출력한다면) 약 150쪽의 페이지를 가지고 있습니다.
- 이 책은 프로젝트 중심입니다. 많은 하스켈 책은 기본 개념과 기능들을 깔끔하게 설명하며 하스켈을 가르칩니다. 이 책에서 우리는 프로그램을 만들고, 그 과정에서 하스켈을 배우려고 합니다. 이것은 누군가에게는 장점이 될테지만 누군가에게는 단점이 될 수 있습니다. 여기 또 다른 튜토리얼들이 있습니다. 가장 눈에 띄는 것들은 Beginning Haskell과 Haskell via Sokoban이 있습니다.
- 이 책은 디자인 패턴, 테스트 코드, 문서화와 같은 중요한 주제에 대해 다룹니다.
- 이 책은 온라인이므로 수정이 쉽습니다.
- 이 책은 공짜입니다.
단점
아마 몇 가지 더 있겠지만, 몇 가지 단점이 있습니다
- 이 책은 깊이가 얕을 수 있습니다 - 많은, 훨씬 긴 하스켈 튜토리얼들은 하스켈의 세부적인 기능들까지 깊게 들어가기에 긴 길이를 가지게 된 것입니다.
- 이 책은 다른 튜토리얼들처럼 많은 기능이나 기술들을 다루지 않을 수 있습니다. - 우리는 구현하는 과정에서 나타나는 기능들을 다루려고 노력하지만, 우리가 하는 작업에 중요하지 않은 기능을 놓칠 수도 있습니다.
- 이 책은 굉장히 새롭습니다 그리고 검증되지 않았습니다. 이 방법이 하스켈을 배우는 좋은 방법인지 누가 알겠습니까? 어쩌면 당신이 도움이 될 수 있습니다!
- 이 책은 기술 편집자가 없습니다, 따라서 책이 기대만큼 좋지는 않습니다.
다른 튜토리얼들
haskell.org/documentation에서는 많은 튜토리얼, 책, 가이드와 강의들의 목록을 볼 수 있습니다. 이 목록에서 내가 추천하는 몇 가지 대안을 찾을 수 있습니다.
Hello, world!
이 장에서는 간단한 HTML "hello world" 프로그램을 만들고 하스켈 툴체인을 사용해 프로그램을 컴파일하고 실행시켜 볼 것입니다.
만약 하스켈 툴체인을 설치하지 않았다면, haskell.org/downloads을 방문하여 하스켈 툴체인을 다운로드 받으세요
하스켈 소스 파일
하스켈 소스 파일은 정의들로 구성됩니다.
가장 일반적인 정의 유형은 다음과 같은 형식을 가지고 있습니다:
<name> = <expression>
🚨 주의:
- expression은 이름에 바인딩하지 않고는 사용할 수 없습니다.
- 이름은 무조건 소문자로 시작해야만 합니다.
- 하나의 파일 안에서 같은 이름은 사용할 수 없습니다.
main
이라는 이름의 정의를 가지고 있는 소스 파일은 실행 파일로 취급됩니다, 그리고 main
은 프로그램의 진입점으로 바인딩 됩니다.
hello.hs
라는 하스켈 소스 파일을 하나 생성하고, 아래 내용을 작성해봅시다:
main = putStrLn "<html><body>Hello, world!</body></html>"
여기서 우리는 main
이라는 새로운 이름을 정의했습니다, 그리고 이것을 putStrLn "<html><body>Hello, world!</body></html>"
이라는 expression에 바인딩 했습니다.
여기서 main
은 putStrLn
이라는 함수를 "<html><body>Hello, world!</body></html>"
이라는 문자열을 인수로 호출하고 있습니다. putStrLn
은 문자열 하나를 입력으로 받아 그 문자열을 터미널에 출력해주는 함수입니다.
주의: 하스켈에서는 함수에 인수를 전달하기 위한 괄호가 필요하지 않습니다.
위 프로그램을 실행하면 다음과 같은 문자가 출력되는 것을 확인할 수 있습니다:
<html><body>Hello, world!</body></html>
프로그램 컴파일하기
이 작은 프로그램을 실행하기 위해서는, 우리는 ghc
라는 커멘트 라인 프로그램을 사용해서 컴파일할 수 있습니다:
> ghc hello.hs
[1 of 1] Compiling Main ( hello.hs, hello.o )
Linking hello ...
hello.hs
파일로 ghc
를 호출하면 다음과 같은 아티팩트 파일들이 생성됩니다:
hello.o
- 오브젝트 파일hello.hi
- 하스켈 인터페이스 파일hello
- 네이티브 실행 파일
컴파일이 끝난 후에는, 다음과 같이 hello
파일을 실행할 수 있습니다
> ./hello
<html><body>Hello, world!</body></html>
프로그램 인터프리팅하기
위와 같이 파일을 컴파일하는 방법도 있지만, 아티팩트 파일의 컴파일 단계를 건너뛰고, 아래와 같이 커맨드 라인 프로그램 runghc
을 사용해 소스 파일을 직접 실행할 수도 있습니다.
> runghc hello.hs
<html><body>Hello, world!</body></html>
또한, 프로그램의 출력을 파일로 만들어 파이어폭스에서 확인해 볼 수도 있습니다.
> runghc hello.hs > hello.html
> firefox hello.html
위의 커맨드는 파이어폭스를 열고 Hello, world!
라고 적혀있는 웹 페이지를 보여줄 것입니다.
이 튜토리얼을 진핸하는 동안에는 runghc
를 사용하기를 추천합니다. 컴파일을 할 경우 훨씬 빠른 프로그램이 나오지만, 인터프리터는 잦은 수정이 일어나는 동안에 빠른 피드백을 제공합니다.
만약 core Haskell tools에 대해 더 알고 싶다면 이 포스트를 읽어볼 수 있습니다, 하지만 위에 적혀있는 내용으로도 앞으로의 과정을 충분히 진행할 수 있습니다.
더 많은 바인딩
우리는 putStrLn
에 바로 HTML 문자열을 보내는 것이 아닌 새로운 이름으로 전달할 수 있습니다. 아래와 같이 hello.hs
의 내용을 바꿔보세요.
main = putStrLn myhtml
myhtml = "<html><body>Hello, world!</body></html>"
주의: 바인딩을 선언하는 순서는 중요하지 않습니다.
HTML 출력 라이브러리 만들기
이번 장에서 우리는 나중에 우리의 마크업 블로그로부터 HTML 페이지를 생성해낼 작은 HTML 출력 라이브러리를 만들며 함수, 타입, 모듈을 포함한 몇 가지 기본적인 하스켈 빌딩 블록들에 대해 알아볼 것입니다.
만약 HTML이 익숙하지 않다면 이 장을 시작하기 전에 MDN의 Getting started with HTML을 보고 오는 것이 이번 장에 대한 좋은 개요가 될 수 있습니다.
유연한 HTML 콘텐츠 (functions)
우리는 HTML의 반복되는 부분(Ex: html, body 등)을 반복해서 작성하지 않고도 HTML 페이지를 작성할 수 있기를 원합니다. 우리는 함수를 통해 그런 작업을 할 수 있습니다.
함수를 만들고 인수를 넣기 위해서는, 저번에 봤던 것처럼 정의를 만들고 이름과 =
표시 사이에 인수의 이름을 적을 수 있습니다.
따라서 함수는 다음과 같은 모양을 가지게 됩니다.
<name> <arg1> <arg2> ... <argN> = <expression>
인수는 등호 오른쪽 범위(<expression>
안쪽)에서 사용할 수 있으며 함수의 이름은 <name>
이 되게 됩니다.
우리는 페이지의 콘텐츠가 될 문자열을 받아 그 문자열을 html
과 body
태그로 감싸는 함수를 작성할 것입니다.
두 개의 문자열을 합치기 위해 연산자 <>
을 사용합니다.
wrapHtml content = "<html><body>" <> content <> "</body></html>"
함수 wrapHtml
은 content
라는 인수를 받아 앞에 <html><body>
을 추가하고 뒤에 </body></html>
을 추가하여 리턴하는 함수입니다.
하스켈에서는 주로 카멜 표기법을 사용한다는 점을 유의하십시오.
이제 이전 장에서 만들었던 myhtml
의 정의를 다음과 같이 수정할 수 있습니다:
myhtml = wrapHtml "Hello, world!"
다시 말하지만, 하스켈에서는 함수를 호출할 때 괄호가 필요하지 않습니다. 함수 호출의 형식은 다음과 같습니다.
<name> <arg1> <arg2> ... <argN>
그러나, myhtml
을 사용하지 않고 main = putStrLn myhtml
에 바로 바인딩하고 싶다면 아래와 같이 괄호로 묶어 식을 나타내야만 합니다.
main = putStrLn (wrapHtml "Hello, world!")
만약 실수로 괄호를 사용하지 않는다면,
main = putStrLn wrapHtml "Hello, world!"
putStrLn
이 두 개의 인수에 적용되지만 하나만 사용한다는 GHC의 오류가 발생합니다. 이는 위의 형식이 <name> <arg1> <arg2>
형식이기 때문입니다. 여기서 앞서 정의한 바와 같이 <arg1>
및 <arg2>
는 <name>
에 대한 인수입니다.
괄호를 사용하여 올바른 순서로 표현식을 묶을 수 있습니다.
연산자 우선순위 및 고정성에 대하여
연산자 (ex:
<>
) 는 양쪽에서 하나씩 두 개의 인수를 취하는 중위 함수입니다.괄호가 없는 동일한 표현식에서 여러 연산자가 있는 경우 고정성 (왼쪽 혹은 오른쪽)과 우선순위 (0부터 10사이의 숫자)에 따라 더 밀접하게 결합되는 연산자가 결정됩니다.
우리의 경우
<>
는 오른쪽 고정성 가지고 있습니다, 따라서 하스켈은 오른쪽에 보이지 않는 괄호를 추가합니다. 예를 들어:"<html><body>" <> content <> "</body></html>"
하스켈의 관점에서는:
"<html><body>" <> (content <> "</body></html>")
우선 순위로 예를 들자면,
1 + 2 * 3
이라는 식에서+
연산자는 우선 순위 6을*
연산자는 우선 순위 7을 가지므로+
보다*
에 우선 순위를 부여합니다. 따라서 하스켈은 이 식을 다음과 같이 보게 될 것입니다.1 + (2 * 3)
우선 순위는 같지만 고정성이 다른 연산자를 혼합해 사용할 경우 에러가 발생할 수 있습니다. 이러한 경우에는 명시적으로 괄호를 사용하여 문제를 해결할 수 있습니다
연습 문제:
-
wrapHtml
의 기능을 두 가지 함수로 분리:- 콘텐츠를
html
태그로 감싸는 함수 - 콘텐츠를
body
태그로 감싸는 함수
새로운 함수의 이름은
html_
과body_
가 되야합니다. - 콘텐츠를
-
myhtml
이 위의 두 함수를 사용하도록 변경하세요. -
<head>
와<title>
태그로 비슷한 함수 2개를 만들고head_
와title_
라는 이름을 붙이세요 -
두 개의 문자열을 인수로 받아들이는 새로운 함수
makeHtml
을 작성하세요:- 문자열 하나는 제목이 되고
- 다른 하나는 바디 콘텐츠가 되어야합니다.
그리고 이전 연습에서 구현한 함수를 사용하여 HTML 문자열을 구성합니다.
함수의 출력은 아래와 같아야 합니다.
makeHtml "My page title" "My page content"
<html><head><title>My page title</title></head><body>My page content</body></html>
-
myhtml
에html_
과body_
을 직접 사용하는 대신makeHtml
을 사용하세요.
해답:
연습 문제 1
html_ content = "<html>" <> content <> "</html>"
body_ content = "<body>" <> content <> "</body>"
연습 문제 2
myhtml = html_ (body_ "Hello, world!")
연습문제 3
head_ content = "<head>" <> content <> "</head>"
title_ content = "<title>" <> content <> "</title>"
연습문제 4
makeHtml title content = html_ (head_ (title_ title) <> body_ content)
연습문제 5
myhtml = makeHtml "Hello title" "Hello, world!"
최종 완성본
-- hello.hs
main = putStrLn myhtml
myhtml = makeHtml "Hello title" "Hello, world!"
makeHtml title content = html_ (head_ (title_ title) <> body_ content)
html_ content = "<html>" <> content <> "</html>"
body_ content = "<body>" <> content <> "</body>"
head_ content = "<head>" <> content <> "</head>"
title_ content = "<title>" <> content <> "</title>"
이제 hello.hs 프로그램을 실행하고 출력을 파일로 옮겨 브라우저에서 열 수 있습니다.
runghc hello.hs > hello.html
firefox hello.html
웹 페이지에 'Hello, world!'가 표시되고 웹 페이지 제목에 'Hello title'이 표시될 것입니다.
들여쓰기
아마 당신은 "어떻게 하스켈은 정의가 끝났다는 것을 알 수 있습니까?" 정답은 하스켈은 들여쓰기를 사용해서 항목들이 그룹화되는 것을 결정합니다.
Haskell의 들여쓰기는 약간 까다로울 수 있지만 일반적으로 다른 표현식의 일부로 간주되는 코드는 해당 표현식의 시작 부분보다 더 들여써야 합니다.
두 번째 정의가 첫 번째 정의보다 들여쓰기되지 않기 때문에 두 정의가 별개라는 것을 알 수 있습니다.
들여쓰기 팁
-
들여쓰기를 위한 특정 양의 공백 (2칸, 4칸 등)을 선택하고 변경하지 마세요. 항상 탭보다는 스페이스를 사용하세요.
-
한번에 여러번 들여쓰지 마세요.
-
확실하지 않은 경우 필요에 따라 줄을 삭제하고 한 번 들여쓰기하세요.
여기 몇가지 예시가 있습니다:
main =
putStrLn "Hello, world!"
또는:
main =
putStrLn
(wrapHtml "Hello, world!")
다음의 스타일을 피하세요: 둘 이상의 들여쓰기 단계를 사용하거나, 들여쓰기 단계를 완전히 무시하는 스타일
main = putStrLn
(wrapHtml "Hello, world!")
main = putStrLn
(wrapHtml "Hello, world!")
Adding type signatures
Haskell is a statically typed programming language. That means that every expression has a type, and we check that the types are valid with regards to each other before running the program. If we discover that they are not valid, an error message will be printed and the program will not run.
An example of a type error would be if we'd pass 3 arguments to a function that takes only 2, or pass a number instead of a string.
Haskell is also type inferred, so we don't need to specify the type of expressions - Haskell can infer from the context of the expression what its type should be, and that's what we did up until now. However, specifying types is useful - it adds a layer of documentation for you or others that will look at the code later, and it helps verify to some degree that what was intended (with the type signature) is what was written (with the expression). It is generally recommended to annotate all top-level definitions with type signatures.
We use double-colon (::
) to specify the type of names. We usually
write it right above the definition of the name itself.
Here are a few examples of types we can write:
Int
- The type of integer numbersString
- The type of stringsBool
- The type of booleans()
- The type of the expression()
, also called unita -> b
- The type of a function from an expression of typea
to an expression of typeb
IO ()
- The type of an expression that represents an IO subroutine that returns()
Let's specify the type of title_
:
title_ :: String -> String
We can see in the code that the type of title_
is a function that takes
a String
and returns a String
.
Let's also specify the type of makeHtml
:
makeHtml :: String -> String -> String
Previously, we thought about makeHtml
as a function that takes
two strings and returns a string.
But actually, all functions in Haskell take exactly one argument as input
and return exactly one value as output. It's just convenient to refer
to functions like makeHtml
as functions with multiple inputs.
In our case, makeHtml
is a function that takes one string argument,
and returns a function. The function it returns takes a string argument
as well and finally returns a string.
The magic here is that ->
is right associative. Which means that when we write:
makeHtml :: String -> String -> String
Haskell parses it as:
makeHtml :: String -> (String -> String)
Consequently, the expression makeHtml "My title"
is also a function!
One that takes a string (the content, the second argument of makeHtml
)
and returns the expected HTML string with "My title" in the title.
This is called partial application.
To illustrate, let's define html_
and body_
in a different way by
defining a new function, el
.
el :: String -> String -> String
el tag content =
"<" <> tag <> ">" <> content <> "</" <> tag <> ">"
el is a function that takes a tag and a content, and wraps the content with the tag.
We can now implement html_
and body_
by partially applying el
and
only provide the tag.
html_ :: String -> String
html_ = el "html"
body_ :: String -> String
body_ = el "body"
Note that we didn't need to add the argument on the left side of
equals sign because Haskell functions are "first class" - they behave
exactly like normal expressions. You can define names to them like
regular values, put them in data structures, pass them to functions,
everything you can do with regular values like Int
or String
.
The way Haskell treats names is very similar to copy paste. Anywhere
you see html_
in the code, you can replace it with el "html"
. They are
the same (this is what the equals signs say, right? That the two sides
are the same). This property, of being able to substitute the two sides of the
equals sign with one another, is called referential transparency. And
it is pretty unique to Haskell (and a few languages that are very
similar to it like PureScript and Elm)! We'll talk more about this in a later chapter.
Anonymous/lambda functions
To further drive the point that Haskell functions are first class and all functions take exactly one argument, I'll mention that the syntax we've been using up until now to define function is just syntactic sugar! We can also define anonymous functions - functions without a name, anywhere we'd like. Anonymous functions are also known as lambda functions. This is a tribute to the original, most primitive functional programming language - the lambda calculus.
We can create an anonymous function anywhere we'd expect an expression
such as "hello"
with the following syntax:
\<argument> -> <expression>
This little \
(which bears some resemblance to the lowercase Greek letter lambda 'λ')
marks the head of the lambda function,
and the arrow (->
) marks the beginning of the body of the function.
We can even chain lambda functions, making them "multiple argument functions" by
defining another lambda in the body of another, like this:
three = (\num1 -> \num2 -> num1 + num2) 1 2
Just as before, we evaluate functions by substituting the function argument with
the applied value. In the example above we substitute num1
with 1
, and get
(\num2 -> 1 + num2) 2
. Then substitute num2
with 2
and get 1 + 2
.
We'll talk more about substitution later.
So, when we write:
el :: String -> String -> String
el tag content =
"<" <> tag <> ">" <> content <> "</" <> tag <> ">"
Haskell actually translates this under the hood to:
el :: String -> (String -> String)
el = \tag -> \content ->
"<" <> tag <> ">" <> content <> "</" <> tag <> ">"
Hopefully this form makes it a bit clearer why Haskell functions always take one argument, even when we have syntactic sugar that might suggest otherwise.
I'll mention one more syntactic sugar for anonymous functions: We don't actually have to write multiple argument anonymous functions this way, we can just write:
\<arg1> <arg2> ... <argN> -> <expression>
to save us some trouble. For example:
three = (\num1 num2 -> num1 + num2) 1 2
But it's worth remembering what they are under the hood.
We won't be needing anonymous/lambda functions at this point, but we'll talk more about them later and see where they can be useful.
Exercises:
-
Add types for all of the functions we created until now
-
Change the implementation of the HTML functions we built to use
el
instead -
Add a couple more functions for defining paragraphs and headings:
p_
which uses the tag<p>
for paragraphsh1_
which uses the tag<h1>
for headings
-
Replace our
Hello, world!
string with richer content, useh1_
andp_
. We can append HTML strings created byh1_
andp_
using the append operator<>
.
Solutions:
Solution for exercise #1
myhtml :: String
myhtml = makeHtml "Hello title" "Hello, world!"
makeHtml :: String -> String -> String
makeHtml title content = html_ (head_ (title_ title) <> body_ content)
html_ :: String -> String
html_ content = "<html>" <> content <> "</html>"
body_ :: String -> String
body_ content = "<body>" <> content <> "</body>"
head_ :: String -> String
head_ content = "<head>" <> content <> "</head>"
title_ :: String -> String
title_ content = "<title>" <> content <> "</title>"
Solution for exercise #2
html_ :: String -> String
html_ = el "html"
body_ :: String -> String
body_ = el "body"
head_ :: String -> String
head_ = el "head"
title_ :: String -> String
title_ = el "title"
Solution for exercise #3
p_ :: String -> String
p_ = el "p"
h1_ :: String -> String
h1_ = el "h1"
Solution for exercise #4
myhtml :: String
myhtml =
makeHtml
"Hello title"
(h1_ "Hello, world!" <> p_ "Let's learn about Haskell!")
Our final program
-- hello.hs
main :: IO ()
main = putStrLn myhtml
myhtml :: String
myhtml =
makeHtml
"Hello title"
(h1_ "Hello, world!" <> p_ "Let's learn about Haskell!")
makeHtml :: String -> String -> String
makeHtml title content = html_ (head_ (title_ title) <> body_ content)
html_ :: String -> String
html_ = el "html"
body_ :: String -> String
body_ = el "body"
head_ :: String -> String
head_ = el "head"
title_ :: String -> String
title_ = el "title"
p_ :: String -> String
p_ = el "p"
h1_ :: String -> String
h1_ = el "h1"
el :: String -> String -> String
el tag content =
"<" <> tag <> ">" <> content <> "</" <> tag <> ">"
Embedded Domain Specific Languages
Right off the bat we run into a common pattern in Haskell: creating Embedded Domain Specific Languages (EDSLs for short).
Domain specific languages (DSLs) are specialized programming languages that are tailored to specific domains, in contrast to general purpose languages, which try to work well in many domains.
A few examples of DSLs are:
- make - for defining build systems
- DOT - for defining graphs
- Sed - for defining text transformations
- CSS - for defining styling
- HTML - for defining web pages
An embedded domain specific language is a little language which is embedded inside another programming language, making a program written in the EDSL a valid program in the language it was written in.
The little HTML library we've been writing can be considered an EDSL. It is used specifically for building web pages (by returning HTML strings), and is valid Haskell code!
In Haskell we frequently create and use EDSLs to express domain specific logic. We have EDSLs for concurrency, command-line options parsing, JSON and HTML, creating build systems, writing tests, and many more.
Specialized languages are useful because they can solve specific problems in a concise (and often safe) way, and by embedding we get to use the full power of the host language for our domain logic, including syntax highlighting and various tools available for the language.
The drawback of embedding domain specific languages is that we have to adhere the rules of the programming language we embed in, such as syntactic and semantic rules.
Some languages alleviate this drawback by providing meta-programming capabilities in the form of macros or other features to extend the language. And while Haskell does provide such capabilities as well, it is also expressive and concise enough that many EDSLs do not need them.
Instead, many Haskell EDSLs use a pattern called the combinator pattern: They define primitives and combinators - primitives are basic building blocks of the language, and combinators are functions that combine primitives to more complex structures.
In our HTML EDSL, our primitives are functions such as html_
and title_
that can be used to create a single HTML node, and we pass other
constructed nodes as input to these functions, and combine them to a more complex
structure with the append function <>
.
There are still a few tricks we can use to make our HTML EDSL better:
-
We can use Haskell's type system to make sure we only construct valid HTML, so for example we don't create a
<title>
node without a<head>
node or have user content that can include unescaped special characters, and throw a type error when the user tries to do something invalid. -
Our HTML EDSL can move to its own module so it can be reused in multiple modules
In the next few sections we'll take a look at how to define our own types and how to work with modules to make it harder to make errors, and a little bit about linked lists in Haskell.
Safer HTML construction with types
In this section we'll learn how to create our own distinguished types for HTML, and how they can help us avoid invalid construction of HTML strings.
There are a few ways of defining new types in Haskell, in this section
we are going to meet two ways: newtype
and type
.
newtype
A newtype
declaration is a way to define a new, distinct type for an existing set of values.
This is useful when we want to reuse existing values but give them different meaning,
and make sure we can't mix the two.
For example, we can represents seconds, minutes, grams and yens using integer values,
but we don't want to accidentally mix grams and seconds.
In our case we want to represent structured HTML using textual values, but distinguish them from everyday strings that are not valid HTML.
A newtype
declaration looks like this:
newtype <type-name> = <constructor> <existing-type>
For example in our case we can define a distinct type for Html
like this:
newtype Html = Html String
The first Html
, to the left of the equals sign, lives in the types
name space, meaning that you will only see that name to the right of a
double-colon sign (::
).
The second Html
lives in the expressions (or terms/values) name space,
meaning that you will see it where you expect expressions (we'll touch where
exactly that can be in a moment).
The two names, <type-name>
and <constructor>
, do not have to be the
same, but they often are. And note that both have to start with a
capital letter.
The right-hand side of the newtype declaration describes the shape of a
value of that type. In our case, we expect a value of
type Html
to have the constructor Html
and then an expression of
type string, for example: Html "hello"
or Html ("hello " <> "world")
.
You can think of the constructor as a function that takes the argument and returns something of our new type:
Html :: String -> Html
Note: We cannot use an expression of type Html
the same way we'd
use a String
. so "hello " <> Html "world"
would fail at type
checking.
This is useful when we want encapsulation. We can define and use existing representation and functions for our underlying type, but not mix them with other, unrelated (to our domain) types. Similar as meters and feet can both be numbers, but we don't want to accidentally add feet to meters without any conversion.
For now, let's create a couple of types for our use case. We want two separate types to represent:
- A complete Html document
- A type for html structures such as headings and paragraphs that can go inside the tag
We want them to be distinct because we don't want to mix them together.
Solution
newtype Html = Html String
newtype Structure = Structure String
Using newtype
s
In order to use the underlying type that the newtype wraps, we first need to extract it out of the type. We do this using pattern matching.
Pattern matching can be used in two ways, in case expressions and in function definitions.
-
case expressions are kind of beefed up switch expressions and look like this:
case <expression> of <pattern> -> <expression> ... <pattern> -> <expression>
The
<expression>
is the thing we want to unpack, and thepattern
is its concrete shape. For example, if we wanted to extract theString
out of the typeStructure
we defined in the exercise above, we do:getStructureString :: Structure -> String getStructureString struct = case struct of Structure str -> str
This way we can extract the
String
out ofStructure
and return it.In later chapters we'll introduce
data
declarations (which are kind of a struct + enum chimera), where we can define multiple constructors to a type. Then the multiple patterns of a case expression will make more sense. -
Alternatively, when declaring a function, we can also use pattern matching on the arguments:
func <pattern> = <expression>
For example:
getStructureString :: Structure -> String getStructureString (Structure str) = str
Using the types we created, we can change the HTML functions we've defined before, namely
html_
,body_
,p_
, etc, to operate on these types instead ofString
s.But first let's meet another operator that will make our code more concise.
One very cool thing about newtype
is that wrapping and extracting expressions doesn't actually
have a performance cost! The compiler knows to remove any wrapping and extraction
of the newtype
constructor and use the underlying type.
The new type and the constructor we defined are only there to help us distinguish between the type we created and the underlying type when we write our code, they are not needed when the code is running.
newtype
s provide us with type safety with no performance penalty!
Chaining functions
Another interesting and extremely common operator
(which is a regular library function in Haskell) is .
(pronounced compose).
This operator was made to look like the composition operator
you may know from math (∘
).
Let's look at its type and implementation:
(.) :: (b -> c) -> (a -> b) -> a -> c
(.) f g x = f (g x)
Compose takes 3 arguments: two functions (named f
and g
here) and
a third argument named x
. It then passes the argument x
to the second
function g
, and calls the first function f
with the result of g x
.
Note that g
takes as input something of the type
a
and returns something of the type b
, and f
takes
something of the type b
, and returns something of the type c
.
Another important thing to note is that types which start with
a lowercase letter are type variables.
Think of them as similar to regular variables. Just like
content
could be any string, like "hello"
or "world"
, a type variable
can be any type: Bool
, String
, String -> String
, etc.
This abilitiy is called parametric polymorphism (other languages often call this generics).
The catch is that type variables must match in a signature, so if for
example we write a function with the type signature a -> a
, the
input type and the return type must match, but it could be
any type - we cannot know what it is. So the only way to implement a
function with that signature is:
id :: a -> a
id x = x
id
, short for the identity function, returns the exact value it received.
If we tried any other way, for example returning some made up value
like "hello"
, or try to use x
like a value of a type we know like
writing x + x
, the type checker will complain.
Also, remember that ->
is right associative? This signature is equivalent to:
(.) :: (b -> c) -> (a -> b) -> (a -> c)
Doesn't it look like a function that takes two functions and returns a third function that is the composition of the two?
We can now use this operator to change our HTML functions. Let's start
with one example: p_
.
Before, we had:
p_ :: String -> String
p_ = el "p"
And now, we can write:
p_ :: String -> Structure
p_ = Structure . el "p"
The function p_
will take an arbitrary String
which is the content
of the paragraph we wish to create, wrap it in <p>
and </p>
tags,
and then wrap it in the Structure
constructor to produce the
output type Structure
(remember: newtype constructors can be used as functions!).
Let's take a deeper look at the types:
Structure :: String -> Structure
el "p" :: String -> String
(.) :: (b -> c) -> (a -> b) -> (a -> c)
Structure . el "p" :: String -> Structure
Let's see why the expression Structure . el "p"
type checks,
and why its type is String -> Structure
.
Type checking with pen and paper
If we want to figure out if and how exactly an expression type-checks, we can do that rather systematically. Let's look at an example where we try and type-check this expression:
p_ = Structure . el "p"
First, we write down the type of the outer-most function. In
our case this is the operator .
which has the type:
(.) :: (b -> c) -> (a -> b) -> (a -> c)
After that, we can try to match the type of the arguments we apply to this function with the type of the arguments from the type signature.
In this case, we try to apply two arguments to .
:
Structure :: String -> Structure
el "p" :: String -> String
And luckily .
expects two arguments with the types:
b -> c
a -> b
Note: Applying a function with more arguments than it expects is a type error.
Since the .
operator takes at least the number of arguments we supply, we continue
to the next phase of type-checking: matching the types of the inputs with the types
of the expected inputs (from the type signature of the operator).
When we match two types, we are checking for equivalence between them. There are a few possible scenarios here:
- When the two types are concrete (as opposed to type variables)
and simple, like
Int
andBool
, we check if they are the same. If they are, they type check and we continue. If they aren't, they don't type check and we throw an error. - When the two types we match are more complex (for example both are functions), we try to match their inputs and outputs (in case of functions). If the inputs and outputs match, then the two types match.
- There is a special case when one of the types is a type variable - in this case we treat the matching process like an equation and we write it down somewhere. The next time we see this type variable, we replace it with its match in the equation. Think about this like assigning a type variable with a value.
In our case, we want to match (or check the equivalence of) these types:
String -> Structure
withb -> c
String -> String
witha -> b
Let's do this one by one, starting with (1) - matching String -> Structure
and b -> c
:
- Because the two types are complex, we check that they are both functions, and match their
inputs and outputs:
String
withb
, andStructure
withc
. - Because
b
is a type variable, we mark down somewhere thatb
should be equivalent toString
. We writeb ~ String
(we use~
to denote equivalence). - We match
Structure
andc
, same as before, we write down thatc ~ Structure
.
No problem so far, let's try matching String -> String
with a -> b
:
- The two types are complex, we see that both are functions so we match their inputs and outputs.
- Matching
String
witha
- we write down thata ~ String
. - Matching
String
withb
- we remember that we have already written aboutb
- looking back we see that we already noted thatb ~ String
. We need to replaceb
with the type that we wrote down before and check it against this type, so we matchString
withString
which, fortunately, type-check because they are the same.
So far so good. We've type-checked the expression and discovered the following equivalences about the type variables in it:
a ~ String
b ~ String
c ~ Structure
Now, when asking what is the type of the expression:
p_ = Structure . el "p"
We say that it is the type of .
after replacing the type variables using the equations we found
and removing the inputs we applied to it, so we started with:
(.) :: (b -> c) -> (a -> b) -> (a -> c)
Then we replaced the type variables:
(.) :: (String -> Structure) -> (String -> String) -> (String -> Structure)
And removed the two arguments when we applied the function:
Structure . el "p" :: String -> Structure
And we got the type of the expression!
Fortunately, Haskell is able to do this process for us. But when Haskell complains that our types fail to type-check and we don't understand exactly why, going through this process can help us understand where the types do not match, and then we can figure out how to solve it.
Note: If we use a parametrically polymorphic function more than once, or use different functions that have similar type variable names, the type variables don't have to match in all instances simply because they share a name. Each instance has its own unique set of type variables. For example:
id :: a -> a ord :: Char -> Int chr :: Int -> Char incrementChar :: Char -> Char incrementChar c = chr (ord (id c) + id 1)
In the snippet above, we use
id
twice (for no good reason other than for demonstration purposes). The firstid
takes aChar
as argument, and itsa
is equivalent toChar
. The secondid
takes anInt
as argument, and its distincta
is equivalent toInt
.This unfortunately only applies to functions defined at the top-level. If we'd define a local function to be passed as an argument to
incrementChar
with the same type signature asid
, the types must match in all uses. So this code:incrementChar :: (a -> a) -> Char -> Char incrementChar func c = chr (ord (func c) + func 1)
Will not type check. Try it!
Appending Structure
Before when we wanted to create richer HTML content and appended
nodes to one another, we used the append (<>
) operator.
Since we are now not using String
anymore, we need another way
to do it.
While it is possible to overload <>
using a feature in
Haskell called type classes, we will instead create a new function
and call it append_
, and cover type classes later.
append_
should take two Structure
s, and return a third Structure
,
appending the inner String
in the first Structure
to the second and wrapping the result back in Structure
.
Try implementing append_
.
Solution
append_ :: Structure -> Structure -> Structure
append_ (Structure a) (Structure b) =
Structure (a <> b)
Converting back Html
to String
After constructing a valid Html
value, we want to be able to
print it to the output so we can display it in our browser.
For that, we need a function that takes an Html
and converts it to a String
, which we can then pass to putStrLn
.
Implement the render
function.
Solution
render :: Html -> String
render html =
case html of
Html str -> str
type
Let's look at one more way to give new names to types.
A type
definition looks really similar to a newtype
definition - the only
difference is that we reference the type name directly without a constructor:
type <type-name> = <existing-type>
For example in our case we can write:
type Title = String
type
, in contrast with newtype
, is just a type name alias.
When we declare Title
as a type alias of String
,
we mean that Title
and String
are interchangeable,
and we can use one or the other whenever we want:
"hello" :: Title
"hello" :: String
Both are valid in this case.
We can sometimes use type
s to give a bit more clarity to our code,
but they are much less useful than newtype
s which allow us to
distinguish two types with the same type representation.
The rest of the owl
Try changing the code we wrote in previous chapters to use the new types we created.
Tips
We can combine
makeHtml
andhtml_
, and removebody_
head_
andtitle_
by callingel
directly inhtml_
, which can now have the typeTitle -> Structure -> Html
. This will make our HTML EDSL less flexible but more compact.Alternatively, we could create
newtype
s forHtmlHead
andHtmlBody
and pass those tohtml_
, and we might do that at later chapters, but I've chosen to keep the API a bit simple for now, we can always refactor later!
Solution
-- hello.hs
main :: IO ()
main = putStrLn (render myhtml)
myhtml :: Html
myhtml =
html_
"My title"
( append_
(h1_ "Heading")
( append_
(p_ "Paragraph #1")
(p_ "Paragraph #2")
)
)
newtype Html
= Html String
newtype Structure
= Structure String
type Title
= String
html_ :: Title -> Structure -> Html
html_ title content =
Html
( el "html"
( el "head" (el "title" title)
<> el "body" (getStructureString content)
)
)
p_ :: String -> Structure
p_ = Structure . el "p"
h1_ :: String -> Structure
h1_ = Structure . el "h1"
el :: String -> String -> String
el tag content =
"<" <> tag <> ">" <> content <> "</" <> tag <> ">"
append_ :: Structure -> Structure -> Structure
append_ c1 c2 =
Structure (getStructureString c1 <> getStructureString c2)
getStructureString :: Structure -> String
getStructureString content =
case content of
Structure str -> str
render :: Html -> String
render html =
case html of
Html str -> str
Are we safe yet?
We have made some progress - now we can't write "Hello"
where we'd expect either a paragraph or a heading, but we can still
write Structure "hello"
and get something that isn't a
paragraph or a heading. So while we made it harder for the user
to make mistakes by accident, we haven't really been able to enforce
the invariants we wanted to enforce in our library.
Next we'll see how we can make expressions such as Structure "hello"
illegal
as well using modules and smart constructors.
Preventing incorrect use with modules
In this section we will move the HTML generation library to its own module.
Modules
Each Haskell source file is a module. The module name should have the
same name as the source file and should start with a capital
letter. Sub-directories should also be part of the name and we use .
to denote a sub-directory. We'll see that in the next section.
The only exception to the rule are entry points to the program -
modules with the name 'Main' that define main
in them. Their source
file names could have any name they want.
A module declaration looks like this:
module <module-name>
( <export-list>
)
where
The export list can be omitted if you want to export everything defined in the module, but we don't. We will list exactly the functions and type we want to export. This will give us control on how people can use our tiny library.
We will create a new source file named Html.hs
and add the following
module declaration code at the top of the file:
module Html
( Html
, Title
, Structure
, html_
, p_
, h1_
, append_
, render
)
where
Note that we do not export:
-
The constructors for our new types, only the types themselves. If we wanted to export the constructors as well we would've written
Html(Html)
orHtml(..)
. This way the user cannot create their ownStructure
by writingStructure "Hello"
. -
Internal functions used by the library, such as
el
andgetStructureString
.
And we will also move the HTML related functions from our hello.hs
file
to this new Html.hs
file:
newtype Html
= Html String
newtype Structure
= Structure String
type Title
= String
html_ :: Title -> Structure -> Html
html_ title content =
Html
( el "html"
( el "head" (el "title" title)
<> el "body" (getStructureString content)
)
)
p_ :: String -> Structure
p_ = Structure . el "p"
h1_ :: String -> Structure
h1_ = Structure . el "h1"
el :: String -> String -> String
el tag content =
"<" <> tag <> ">" <> content <> "</" <> tag <> ">"
append_ :: Structure -> Structure -> Structure
append_ c1 c2 =
Structure (getStructureString c1 <> getStructureString c2)
getStructureString :: Structure -> String
getStructureString content =
case content of
Structure str -> str
render :: Html -> String
render html =
case html of
Html str -> str
Now, anyone importing our module (using the import
statement
below module declarations but above any other
declaration), will only be able to import what we export.
Add the following code at the top of the hello.hs
file:
import Html
The hello.hs
file should now look like this:
-- hello.hs
import Html
main :: IO ()
main = putStrLn (render myhtml)
myhtml :: Html
myhtml =
html_
"My title"
( append_
(h1_ "Heading")
( append_
(p_ "Paragraph #1")
(p_ "Paragraph #2")
)
)
And the Html.hs
file should look like this:
-- Html.hs
module Html
( Html
, Title
, Structure
, html_
, p_
, h1_
, append_
, render
)
where
newtype Html
= Html String
newtype Structure
= Structure String
type Title
= String
html_ :: Title -> Structure -> Html
html_ title content =
Html
( el "html"
( el "head" (el "title" title)
<> el "body" (getStructureString content)
)
)
p_ :: String -> Structure
p_ = Structure . el "p"
h1_ :: String -> Structure
h1_ = Structure . el "h1"
el :: String -> String -> String
el tag content =
"<" <> tag <> ">" <> content <> "</" <> tag <> ">"
append_ :: Structure -> Structure -> Structure
append_ c1 c2 =
Structure (getStructureString c1 <> getStructureString c2)
getStructureString :: Structure -> String
getStructureString content =
case content of
Structure str -> str
render :: Html -> String
render html =
case html of
Html str -> str
Escaping characters
Now that Html
has its own source file and module, and creating
HTML code can be done only via the functions we exported,
we can also handle user input that may contain characters
that may conflict with our meta language, HTML,
such as <
and >
which are used for creating HTML tags.
We can convert these characters into different strings that HTML can handle.
See Stack overflow question for a list of characters we need to escape.
Let's create a new function called escape
:
escape :: String -> String
escape =
let
escapeChar c =
case c of
'<' -> "<"
'>' -> ">"
'&' -> "&"
'"' -> """
'\'' -> "'"
_ -> [c]
in
concat . map escapeChar
In escape
we see a few new things:
-
Let expressions: we can define local names using this syntax:
let <name> = <expression> in <expression>
This will make
<name>
available as a variablein
the second<expression>
. -
Pattern matching with multiple patterns: we match on different characters and convert them to a string. Note that
_
is a "catch all" pattern that will always succeed. -
Two new functions:
map
andconcat
, we'll talk about these more in depth -
That the syntax highlighting broke a bit for this snippet for some reason. Don't worry about it.
Linked lists briefly
Linked lists are very common data structures in Haskell, so common that they have their own special syntax:
- The list types are denoted with brackets and inside them is the type of the element. For example:
[Int]
- a list of integers[Char]
- a list of characters[String]
- a list of strings[[String]]
- a list of a list of strings[a]
- a list of any single type (all elements must be of the same type)
- An empty list is written like this:
[]
- Prepending an element to a list is done with the operator
:
(pronounced cons) which is right-associative (like->
). For example:1 : []
, or1 : 2 : 3 : []
. - The above lists can also be written like this:
[1]
and[1, 2, 3]
.
Also, Strings are linked lists of characters - String is defined as:
type String = [Char]
, so we can use them the same way we use lists.
Do note, however, that linked lists, despite their convenience, are often not the right tool for the job. They are not particularly space efficient and are slow for appending, random access and more. That also makes
String
a lot less efficient than what it could be. And I generally recommend using a different string type,Text
, instead, which is available in an external package. We will talk about lists,Text
, and other data structures in the future!
We can implement our own operations on lists by using pattern matching and recursion. And we'll touch on this subject later when talking about ADTs.
For now, we will use the various functions found in the Data.List module. Specifically, map and concat.
map
Using map
we can apply a function to each of the elements in a list. Its type signature is:
map :: (a -> b) -> [a] -> [b]
For example:
map not [False, True, False] == [True, False, True]
Or as can be seen in our escape
function, this can help us escape each character:
map escapeChar ['<','h','1','>'] == ["<","h","1",">"]
However, note that the escapeChar
has the type Char -> String
,
so the result type of map escapeChar ['<','h','1','>']
is [String]
,
and what we really want is a String
and not [String]
.
This is where concat
enters the picture to help us flatten the list.
concat
concat
has the type:
concat :: [[a]] -> [a]
It flattens a list of list of something into a list of something.
In our case it will flatten [String]
into String
, remember that
String
is a type alias for [Char]
, so we actually have
[[Char]] -> [Char]
.
GHCi
One way we can quickly see our code in action is using the interactive development environment GHCi.
Running ghci
will open an interactive prompt where Haskell expressions can be written and
evaluated. This is called a "Read-Evaluate-Print Loop" (for short - REPL).
For example:
ghci> 1 + 1
2
ghci> putStrLn "Hello, world!"
Hello, world!
We can define new names:
ghci> double x = x + x
ghci> double 2
4
We can write multi-line code by surrounding it with :{
and :}
:
ghci> :{
| escape :: String -> String
| escape =
| let
| escapeChar c =
| case c of
| '<' -> "<"
| '>' -> ">"
| '&' -> "&"
| '"' -> """
| '\'' -> "'"
| _ -> [c]
| in
| concat . map escapeChar
| :}
ghci> escape "<html>"
"<html>"
We can import Haskell source files using the :load
command (:l
for short):
ghci> :load Html.hs
[1 of 1] Compiling Html ( Html.hs, interpreted )
Ok, one module loaded.
ghci> render (html_ "<title>" (p_ "<body>"))
"<html><head><title><title></title></head><body><p><body></p></body></html>"
As well as import library modules:
ghci> import Data.Bits
ghci> shiftL 32 1
64
ghci> clearBit 33 0
32
We can even ask the type of an expression using the :type
command
(:t
for short):
λ> :type escape
escape :: String -> String
To exit ghci
, use the :quit
command (or :q
for short)
ghci> :quit
Leaving GHCi.
GHCi is a very useful tool for quick experiments and exploration.
We've seen a couple of examples of that above - passing the string "<html>"
to our
escape
function returns the string "<html>"
, which can be rendered by
a browser as <html>
instead of an HTML tag.
If you are having a hard time figuring out what a particular function does, consider testing it in GHCi - pass it different inputs, and see if it matches your expectations. Concrete examples of running code can aid a lot in understanding it!
If you'd like to learn more about GHCi, you can find a more thorough introduction in the GHC user guide.
Escaping
The user of our library can currently only supply strings in a few places:
- Page title
- Paragraphs
- Headings
We can apply our escape function at these places before doing anything else with it. That way all HTML constructions are safe.
Try adding the escaping function in those places.
Solution
html_ :: Title -> Structure -> Html
html_ title content =
Html
( el "html"
( el "head" (el "title" (escape title))
<> el "body" (getStructureString content)
)
)
p_ :: String -> Structure
p_ = Structure . el "p" . escape
h1_ :: String -> Structure
h1_ = Structure . el "h1" . escape
Our revised Html.hs
-- Html.hs
module Html
( Html
, Title
, Structure
, html_
, p_
, h1_
, append_
, render
)
where
-- * Types
newtype Html
= Html String
newtype Structure
= Structure String
type Title
= String
-- * EDSL
html_ :: Title -> Structure -> Html
html_ title content =
Html
( el "html"
( el "head" (el "title" (escape title))
<> el "body" (getStructureString content)
)
)
p_ :: String -> Structure
p_ = Structure . el "p" . escape
h1_ :: String -> Structure
h1_ = Structure . el "h1" . escape
append_ :: Structure -> Structure -> Structure
append_ c1 c2 =
Structure (getStructureString c1 <> getStructureString c2)
-- * Render
render :: Html -> String
render html =
case html of
Html str -> str
-- * Utilities
el :: String -> String -> String
el tag content =
"<" <> tag <> ">" <> content <> "</" <> tag <> ">"
getStructureString :: Structure -> String
getStructureString content =
case content of
Structure str -> str
escape :: String -> String
escape =
let
escapeChar c =
case c of
'<' -> "<"
'>' -> ">"
'&' -> "&"
'"' -> """
'\'' -> "'"
_ -> [c]
in
concat . map escapeChar
Try constructing an invalid HTML in hello.hs
to see if this works or not!
Now we can use our tiny HTML library safely. But what if the user wants to use our library with a valid use case we didn't think about, for example adding unordered lists? We are completely blocking them from extending our library. We'll talk about this next.
Exposing internal functionality (Internal modules)
We have now built a very small but convenient and safe way to write HTML code in Haskell. This is something that we could (potentially) publish as a library and share with the world by uploading it to a package repository such as Hackage. Users who are interested in our library could use a package manager to include our library in their project and build their own HTML pages with it.
It is important to note that users are building their project against
the API that we expose to them, and the package manager doesn't generally
provide access to the source code, so they can't, for example,
modify the Html
module (that we expose) in their project directly
without jumping through some hoops.
Because we wanted our Html
EDSL to be safe, we hid the internal
implementation from the user, and the only way to interact with the
library is via the API we provide.
This provides the safety we wanted to provide, but in this case it also blocks the user from extending our library in their own project with things we haven't implemented yet, such as lists or code blocks.
When a user runs into trouble with a library (such as missing features) the best course of action usually is to open an issue in the repository or submit a pull request, but sometimes the user needs things to work now.
We admit that we are not perfect and can't think of all use cases for our library. Sometimes the restrictions we add are too great and may limit the usage of advanced users that know how things work under the hood and need certain functionality in order to use our library.
Internal modules
For that we can expose internal modules to provide some flexibility for advanced users. Internal modules are not a language concept but rather a (fairly common) design pattern (or idiom) in Haskell.
Internal modules are simply modules named <something>.Internal
,
which export all of the functionality and implementation details in that module.
Instead of writing the implementation in (for example) the Html
module,
we write it in the Html.Internal
module, which will export everything.
Then we will import that module in the Html
module, and write an explicit export list
to only export the API we'd like to export (as before).
Internal
modules are considered unstable and risky to use by convention.
If you end up using one yourself when using an external Haskell library,
make sure to open a ticket in the library's repository after the storm has passed!
Let's make the changes
We will create a new directory named Html
and inside it a new file
named Internal.hs
. The name of this module should be Html.Internal
.
This module will contain all of the code we previously had in the Html
module, but we will change the module declaration in Html.Internal
and omit the export list:
-- Html/Internal.hs
module Html.Internal where
...
And now in Html.hs
, we will remove the code that we moved to Html/Internal.hs
and in its stead we'll import the internal module:
-- Html.hs
module Html
( Html
, Title
, Structure
, html_
, p_
, h1_
, append_
, render
)
where
import Html.Internal
Now, users of our library can still import Html
and safely use our library,
but if they run into trouble and have a dire need to implement unordered lists
to work with our library, they could always work with Html.Internal
instead.
Our revised Html.hs and Html/Internal.hs
-- Html.hs
module Html
( Html
, Title
, Structure
, html_
, p_
, h1_
, append_
, render
)
where
import Html.Internal
-- Html/Internal.hs
module Html.Internal where
-- * Types
newtype Html
= Html String
newtype Structure
= Structure String
type Title
= String
-- * EDSL
html_ :: Title -> Structure -> Html
html_ title content =
Html
( el "html"
( el "head" (el "title" (escape title))
<> el "body" (getStructureString content)
)
)
p_ :: String -> Structure
p_ = Structure . el "p" . escape
h1_ :: String -> Structure
h1_ = Structure . el "h1" . escape
append_ :: Structure -> Structure -> Structure
append_ c1 c2 =
Structure (getStructureString c1 <> getStructureString c2)
-- * Render
render :: Html -> String
render html =
case html of
Html str -> str
-- * Utilities
el :: String -> String -> String
el tag content =
"<" <> tag <> ">" <> content <> "</" <> tag <> ">"
getStructureString :: Structure -> String
getStructureString content =
case content of
Structure str -> str
escape :: String -> String
escape =
let
escapeChar c =
case c of
'<' -> "<"
'>' -> ">"
'&' -> "&"
'"' -> """
'\'' -> "'"
_ -> [c]
in
concat . map escapeChar
Summary
For our particular project, Internal
modules aren't necessary.
Because our project and the source code for the HTML EDSL are
part of the same project, and we have access to the Html
module directly, we can always go and edit it if we want
(and we are going to do that throughout the book).
However, if we were planning to release our HTML EDSL as a library
for other developers to use, it would be nice
to also expose the internal implementation as an Internal
module. Just so we can save some trouble for potential users!
We will see how to create a package from our source code in a later chapter.
Exercises
We need a few more features for our HTML library to be useful for
our blog software. Add the following features to our Html.Internal
module
and expose them from Html
.
1. Unordered lists
These lists have the form:
<ul>
<li>item 1</li>
<li>item 2</li>
<li>...</li>
</ul>
We want in our library a new function:
ul_ :: [Structure] -> Structure
So that users can write this:
ul_
[ p_ "item 1"
, p_ "item 2"
, p_ "item 3"
]
and get this:
<ul>
<li><p>item 1</p></li>
<li><p>item 2</p></li>
<li><p>item 3</p></li>
</ul>
2. Ordered lists
Very similar to unordered lists, but instead of <ul>
we use <ol>
3. Code blocks
Very similar to <p>
, but use the <pre>
tag. Call this function code_
.
Solutions
Unordered lists
ul_ :: [Structure] -> Structure
ul_ =
Structure . el "ul" . concat . map (el "li" . getStructureString)
Ordered lists
ol_ :: [Structure] -> Structure
ol_ =
Structure . el "ol" . concat . map (el "li" . getStructureString)
Note: the two functions above could be unified.
Code blocks
code_ :: String -> Structure
code_ = Structure . el "pre" . escape
Summary
In this chapter we built a very minimal HTML EDSL. We will later use this library to convert our custom markup formatted text to HTML.
We've also learned about:
- Defining and using functions
- Types and type signatures
- Embedded domain specific languages
- Chaining functions using the
.
operator - Preventing incorrect use with
newtype
s - Defining modules and the
Internal
module pattern - Encapsulation using
newtype
s and modules
Here's our complete program up to this point:
-- hello.hs
import Html
main :: IO ()
main = putStrLn (render myhtml)
myhtml :: Html
myhtml =
html_
"My title"
( append_
(h1_ "Heading")
( append_
(p_ "Paragraph #1")
(p_ "Paragraph #2")
)
)
-- Html.hs
module Html
( Html
, Title
, Structure
, html_
, h1_
, p_
, ul_
, ol_
, code_
, append_
, render
)
where
import Html.Internal
-- Html/Internal.hs
module Html.Internal where
-- * Types
newtype Html
= Html String
newtype Structure
= Structure String
type Title
= String
-- * EDSL
html_ :: Title -> Structure -> Html
html_ title content =
Html
( el "html"
( el "head" (el "title" (escape title))
<> el "body" (getStructureString content)
)
)
p_ :: String -> Structure
p_ = Structure . el "p" . escape
h1_ :: String -> Structure
h1_ = Structure . el "h1" . escape
ul_ :: [Structure] -> Structure
ul_ =
Structure . el "ul" . concat . map (el "li" . getStructureString)
ol_ :: [Structure] -> Structure
ol_ =
Structure . el "ol" . concat . map (el "li" . getStructureString)
code_ :: String -> Structure
code_ = Structure . el "pre" . escape
append_ :: Structure -> Structure -> Structure
append_ c1 c2 =
Structure (getStructureString c1 <> getStructureString c2)
-- * Render
render :: Html -> String
render html =
case html of
Html str -> str
-- * Utilities
el :: String -> String -> String
el tag content =
"<" <> tag <> ">" <> content <> "</" <> tag <> ">"
getStructureString :: Structure -> String
getStructureString content =
case content of
Structure str -> str
escape :: String -> String
escape =
let
escapeChar c =
case c of
'<' -> "<"
'>' -> ">"
'&' -> "&"
'"' -> """
'\'' -> "'"
_ -> [c]
in
concat . map escapeChar
You can also browse the code as a tree.
Custom markup language
In this chapter we will define our own simple markup language and parse documents written in this language into Haskell data structures.
Our markup language will contain the following features:
- Headings: prefix by a number of
*
characters - Paragraphs: a group of lines without empty lines in between
- Unordered lists: a group of lines each prefixed with
-
- Ordered lists: a group of lines each prefixed with
#
- Code blocks: a group of lines each prefixed with
>
Here's a sample document:
* Compiling programs with ghc
Running ghc invokes the Glasgow Haskell Compiler (GHC),
and can be used to compile Haskell modules and programs into native
executables and libraries.
Create a new Haskell source file named hello.hs, and write
the following code in it:
> main = putStrLn "Hello, Haskell!"
Now, we can compile the program by invoking ghc with the file name:
> ➜ ghc hello.hs
> [1 of 1] Compiling Main ( hello.hs, hello.o )
> Linking hello ...
GHC created the following files:
- hello.hi - Haskell interface file
- hello.o - Object file, the output of the compiler before linking
- hello (or hello.exe on Microsoft Windows) - A native runnable executable.
GHC will produce an executable when the source file satisfies both conditions:
# Defines the main function in the source file
# Defines the module name to be Main, or does not have a module declaration
Otherwise, it will only produce the .o and .hi files.
which we will, eventually, convert into this (modulo formatting) HTML:
<h1>Compiling programs with ghc</h1>
<p>Running ghc invokes the Glasgow Haskell Compiler (GHC),
and can be used to compile Haskell modules and programs into native
executables and libraries.
</p>
<p>Create a new Haskell source file named hello.hs, and write
the following code in it:
</p>
<pre>main = putStrLn "Hello, Haskell!"
</pre>
<p>Now, we can compile the program by invoking ghc with the file name:</p>
<pre>
➜ ghc hello.hs
[1 of 1] Compiling Main ( hello.hs, hello.o )
Linking hello ...
</pre>
<p>GHC created the following files:
</p>
<ul>
<li>hello.hi - Haskell interface file</li>
<li>hello.o - Object file, the output of the compiler before linking</li>
<li>hello (or hello.exe on Microsoft Windows) - A native runnable executable.</li>
</ul>
<p>GHC will produce an executable when the source file satisfies both conditions:
</p>
<ol>
<li>Defines the main function in the source file</li>
<li>Defines the module name to be Main, or does not have a module declaration</li>
</ol>
<p>Otherwise, it will only produce the .o and .hi files.
</p>
Representing the markup language as a Haskell data type
One of the clear differentiators between Haskell (also other ML-family of languages) and most mainstream languages is the ability to represent data precisely and succinctly.
So how do we represent our markup language using Haskell?
Previously, in our HTML builder library, we used newtype
s to differentiate
between HTML documents, structures and titles, but we didn't really need to
differentiate between different kinds of structures such as paragraphs and headings,
not without parsing the data at least.
In this case, we have a list of structures, and each structure could be one of a few specific options (a paragraph, a heading, a list, etc), and we want to be able to know which structure is which so we can easily convert it into the equivalent HTML representation.
For that, we have data
definitions. data
gives us the ability to
create custom types by grouping multiple types together and having
alternative structures. Think of them as combination of both structs and enums.
data
declarations look like this:
data <Type-name> <type-args>
= <Data-constructor1> <types>
| <Data-constructor2> <types>
| ...
It looks really similar to newtype
, but there are two important
differences:
- In the
<types>
part we can write many types (LikeInt
,String
, orBool
). Fornewtype
s we can only write one. - We can have alternative structures using
|
,newtype
s have no alternatives.
This is because newtype
is used to provide a type safe alias, and data
is used to build a new composite type that can potentially have alternatives.
Let's see a few of examples of data types:
-
Bool
data Bool = True | False
We created a new data type named
Bool
with the possible valuesTrue
orFalse
. In this case we only have constructor alternatives and none of the constructors carry additional values. This is similar to enums in other languages. -
Person
data Person = Person String Int -- where the first is the name and the second is -- the age
We created a new data type named
Person
. Values of the typePerson
look like this:Person <some-string> <some-int>
For example:
Person "Gil" 32
In this case we create a composite of multiple types, without alternatives. This is similar to structs in other language, but structs give each field a name, and here we distinguish them by position.
Alternatively, Haskell has syntactic sugar for naming fields called records. The above definition can also be written like this:
data Person = Person { name :: String , age :: Int }
Values of this type can be written exactly as before,
Person "Gil" 32
Or with this syntax:
Person { name = "Gil", age = 32 }
Haskell will also generate functions that can be used to extract the fields from the composite type:
name :: Person -> String age :: Person -> Int
Which can be used like this:
ghci> age (Person { name = "Gil", age = 32 }) 32
We even have special syntax for updating specific fields in a record. Of course, we do not update records in place - we generate a new value instead.
ghci> gil = Person { name = "Gil", age = 32 } ghci> age (gil { age = 33 }) 33 ghci> age gil 32
Unfortunately, having specialized functions for each field also means that if we defined a different data type with the field
age
, the functions which GHC needs to generate will clash.The easiest way to solve this is to give fields unique names, for example by adding a prefix:
data Person = Person { pName :: String , pAge :: Int }
Another way is by using extensions to the Haskell language, which we will cover in later chapters.
-
Tuple
data Tuple a b = Tuple a b
This is pretty similar to
Person
, but we can plug any type we want for this definition. For example:Tuple "Clicked" True :: Tuple String Bool Tuple 'a' 'z' :: Tuple Char Char
This type has special syntax in Haskell:
("Clicked", True) :: (String, Bool) ('a', 'z') :: (Char, Char)
This
Tuple
definition is polymorphic, we define the structure but are able to plug different types into the structure to get concrete types. You can think ofTuple
as a template for a data type waiting to be filled, or as a function waiting for types as input in order to return a data type. We can even take a look at the "type" signature ofTuple
inghci
using the:kind
command.ghci> data Tuple a b = Tuple a b ghci> :kind Tuple Tuple :: * -> * -> *
Quick detour: Kinds
The
:kind
command is called as such because the "type" of a type is called a kind. Kinds can be one of two things, either a*
which means a saturated (or concrete) type, such asInt
orPerson
, or an->
of two kinds, which is, as you might have guessed, a type function, taking kind and returning a kind.Note that only types that have the kind
*
can have values. So for example whileTuple Int
is a valid Haskell concept that has the kind* -> *
, and we can write code that will work "generically" for all types that have a certain kind (e.g.* -> *
), we cannot construct a value that will have the kind* -> *
. All values have types, and all types that have values have the kind*
.We will talk more about kinds later, for now let's focus on types!
-
Either
data Either a b = Left a | Right b
Similar to Tuple but instead of having only one constructor, we have two. This means that we can choose which side we want. Here are a couple of values of type
Either String Int
:Left "Hello" Right 17
This type is useful for modeling errors. Either we succeeded and got what we wanted (The
Right
constructor with the value), or we didn't and got an error instead (TheLeft
constructor with a string or a custom error type).
In our program we use data
types to model the different kinds of content types
in our markup language. We tag each structure using the data constructor
and provide the rest of the information (the paragraph text, the list items, etc)
in the <types>
section of the data declaration for each constructor:
type Document
= [Structure]
data Structure
= Heading Natural String
| Paragraph String
| UnorderedList [String]
| OrderedList [String]
| CodeBlock [String]
Note: Natural
is defined in the base
package but not exported from Prelude
.
Find out which module to import Natural
by using Hoogle.
Exercises
Represent the following markup documents as values of Document
:
-
Hello, world!
-
* Welcome To this tutorial about Haskell.
-
Remember that multiple lines with no separation are grouped together to a single paragraph but list items remain separate. # Item 1 of a list # Item 2 of the same list
-
* Compiling programs with ghc Running ghc invokes the Glasgow Haskell Compiler (GHC), and can be used to compile Haskell modules and programs into native executables and libraries. Create a new Haskell source file named hello.hs, and write the following code in it: > main = putStrLn "Hello, Haskell!" Now, we can compile the program by invoking ghc with the file name: > ➜ ghc hello.hs > [1 of 1] Compiling Main ( hello.hs, hello.o ) > Linking hello ... GHC created the following files: - hello.hi - Haskell interface file - hello.o - Object file, the output of the compiler before linking - hello (or hello.exe on Microsoft Windows) - A native runnable executable. GHC will produce an executable when the source file satisfies both conditions: # Defines the main function in the source file # Defines the module name to be Main, or does not have a module declaration Otherwise, it will only produce the .o and .hi files.
Solutions:
Solution 1
example1 :: Document
example1 =
[ Paragraph "Hello, world!"
]
Solution 2
example2 :: Document
example2 =
[ Heading 1 "Welcome"
, Paragraph "To this tutorial about Haskell."
]
Solution 3
example3 :: Document
example3 =
[ Paragraph "Remember that multiple lines with no separation are grouped together to a single paragraph but list items remain separate."
, OrderedList
[ "Item 1 of a list"
, "Item 2 of the same list"
]
]
Solution 4
example4 :: Document
example4 =
[ Heading 1 "Compiling programs with ghc"
, Paragraph "Running ghc invokes the Glasgow Haskell Compiler (GHC), and can be used to compile Haskell modules and programs into native executables and libraries."
, Paragraph "Create a new Haskell source file named hello.hs, and write the following code in it:"
, CodeBlock
[ "main = putStrLn \"Hello, Haskell!\""
]
, Paragraph "Now, we can compile the program by invoking ghc with the file name:"
, CodeBlock
[ "➜ ghc hello.hs"
, "[1 of 1] Compiling Main ( hello.hs, hello.o )"
, "Linking hello ..."
]
, Paragraph "GHC created the following files:"
, UnorderedList
[ "hello.hi - Haskell interface file"
, "hello.o - Object file, the output of the compiler before linking"
, "hello (or hello.exe on Microsoft Windows) - A native runnable executable."
]
, Paragraph "GHC will produce an executable when the source file satisfies both conditions:"
, OrderedList
[ "Defines the main function in the source file"
, "Defines the module name to be Main, or does not have a module declaration"
]
, Paragraph "Otherwise, it will only produce the .o and .hi files."
]
Add a new module named Markup
and add the data type definition to it.
Note that in this case we do want to export the constructors of Structure
.
Solution
-- Markup.hs
module Markup
( Document
, Structure(..)
)
where
import Numeric.Natural
type Document
= [Structure]
data Structure
= Heading Natural String
| Paragraph String
| UnorderedList [String]
| OrderedList [String]
| CodeBlock [String]
Translating directly?
You might ask "Why do we even need to represent the markup as a type? Why don't we convert it into HTML as soon as we parse it instead?". That's a good question and a valid strategy. The reason we first represent it as a Haskell type is for flexibility and modularity.
If the parsing code is coupled with HTML generation, we lose the ability to pre-process the markup document. For example we might want to take only a small part of the document (for summary) and present it, or create a table of content from headings. Or maybe we'd like to add other targets and not just HTML - maybe markdown format or a GUI reader?
Parsing to an "abstract data type" (ADT) representation (one that does not contain the details of the language, for example '#' for ordered lists) gives us the freedom to do so much more than just conversion to HTML that it's usually worth it in my opinion unless you really need to optimize the process.
Parsing markup part 01 (Recursion)
Let's have a look at how to parse a multi-lined string of markup text
written by a user and convert it to the Document
type we defined
in the previous chapter.
Our strategy is to take the string of markup text, and:
- Split it to a list where each element represents a separate line, and
- Go over the list line by line and process it, remembering information from previous lines if necessary
So the first thing we want to do is to process the string line by line.
We can do that by converting the string to a list of string.
Fortunately the Haskell
Prelude
module from the Haskell standard library
base
exposes the function
lines
that does exactly what we want. The Prelude
module is exposed in every
Haskell file by default so we don't need to import it.
For the line processing part, let's start by ignoring all of the markup syntax and just group lines together into paragraphs (paragraphs are separated by an empty line), and iteratively add new features later in the chapter.
A common solution in imperative programs would be to iterate over the lines using some loop construct and accumulate lines that should be grouped together into some intermediate mutable variable. When we reach an empty line, we insert the content of that variable into another mutable variable that accumulates the results.
Our approach in Haskell isn't so different, except that we do not use loops or mutable variables. Instead, we use recursion.
Recursion and accumulating information
Instead of loops, in Haskell we use recursion to model iteration.
Consider the following contrived example: let's say that
we want to write an algorithm for adding two natural numbers together,
and we don't have a standard operation to do that (+), but we do
have two operations we could use on each number: increment
and decrement
.
A solution we could come up with is to slowly "pass" one number to the other number iteratively, by incrementing one, and decrementing the other. And we do that until the number we decrement reaches 0.
For example for 3
and 2
:
- We start with
3
and2
, and we increment3
and decrement2
- On the next step we now have
4
and1
, we increment4
and decrement1
- On the next step we now have
5
and0
, since the second number is0
we declare5
as the result.
This can be written imperatively using a loop:
function add(n, m) {
while (m /= 0) {
n = increment(n);
m = decrement(m);
}
return n;
}
We can write the same algorithm in Haskell without mutation using recursion:
add n m =
if m /= 0
then add (increment n) (decrement m)
else n
In Haskell, in order to emulate iteration with mutable state, we call the function again with the values we want the variables to have in the next iteration.
Evaluation of recursion
Recursion commonly has a bad reputation for being slow and possibly unsafe compared to loops. This is because in imperative languages, calling a function often requires creating a new call stack.
However, functional languages (and Haskell in particular) play by different
rules and implement a feature called tail call elimination - when the result of a function call
is the result of the function (this is called tail position), we can just drop the current
stack frame and then allocate one for the function we call, so we don't require N
stack frames
for N
iterations.
This is of course only one way to do tail call elimination and other
strategies exist, such as translating code like our recursive add
above to the iteration version.
Laziness
Haskell plays by slightly different rules because it uses a lazy evaluation strategy instead of the much more common strict evaluation strategy. An evaluation strategy refers to "when do we evaluate a computation". In a strict language the answer is simple: we evaluate the arguments of a function before entering a function.
So for example the evaluation of add (increment 3) (decrement 2)
using strict evaluation
will look like this:
- Evaluate
increment 3
to4
- Evaluate
decrement 2
to1
- Evaluate
add 4 1
Or, alternatively (depending on the language) we reverse (1) and (2) and evaluate the arguments from right-to-left instead of left-to-right.
On the other hand, with lazy evaluation, we only evaluate computation when we need it, which is when it is part of a computation that will have some effect on the outside world, for example when writing a computation to standard output or sending it over the network.
So unless this computation is required, it won't be evaluated. For example:
main =
if add (increment 2) (decrement 3) == 5
then putStrLn "Yes."
else putStrLn "No."
In the case above, we need the result of add (increment 2) (decrement 3)
in order to know which message to write,
so it will be evaluated. But:
main =
let
five = add (increment 2) (decrement 3)
in
putStrLn "Not required"
In the case above we don't actually need five
, so we don't evaluate it!
But then if we know we need add (increment 2) (decrement 3)
,
do we use strict evaluation now? The answer is no - because we might not need
to evaluate the arguments to complete the computation. For example in this case:
const a b = a
main =
if const (increment 2) (decrement 3) == 3
then putStrLn "Yes."
else putStrLn "No."
const
ignores the second argument and returns the first, so we don't actually need
to calculate decrement 3
in order to provide an answer to the computation and in
turn output an answer to the screen.
With the lazy evaluation strategy we will evaluate expressions when we need to (when they are required
in order to do something for the user), and we evaluate from the outside in - first
we enter functions, and then we evaluate the arguments when we need to (usually when the thing
we want to evaluate appears in some control flow such as the condition of an if
expression
or a pattern in pattern matching).
I've written a more in-depth blog post about how this works in Haskell: Substitution and Equational Reasoning.
Please read it and try to evaluate the following program by hand:
import Prelude hiding (const) -- feel free to ignore this line
increment n = n + 1
decrement n = n - 1
const a b = a
add n m =
if m /= 0
then add (increment n) (decrement m)
else n
main =
if const (add 3 2) (decrement 3) == 5
then putStrLn "Yes."
else putStrLn "No."
Remember that evaluation always begins from main
.
Solution
evaluating main
if const (add 3 2) (decrement 3) == 5
then putStrLn "Yes."
else putStrLn "No."
expanding const
if add 3 2 == 5
then putStrLn "Yes."
else putStrLn "No."
expanding add
if (if 2 /= 0 then add (increment 3) (decrement 2) else 3) == 5
then putStrLn "Yes."
else putStrLn "No."
evaluating the control flow 2 /= 0
if (if True then add (increment 3) (decrement 2) else 3) == 5
then putStrLn "Yes."
else putStrLn "No."
Choosing the then
branch
if (add (increment 3) (decrement 2)) == 5
then putStrLn "Yes."
else putStrLn "No."
expanding add
if
( if decrement 2 /= 0
then add
(increment (increment 3))
(decrement (decrement 2))
else (increment 3)
) == 5
then putStrLn "Yes."
else putStrLn "No."
Evaluating decrement 2
in the control flow (notice how both places change!)
if
( if 1 /= 0
then add
(increment (increment 3))
(decrement 1)
else (increment 3)
) == 5
then putStrLn "Yes."
else putStrLn "No."
Evaluating the control flow 1 /= 0
if
( if True
then add
(increment (increment 3))
(decrement 1)
else (increment 3)
) == 5
then putStrLn "Yes."
else putStrLn "No."
Choosing the then
branch
if
( add
(increment (increment 3))
(decrement 1)
) == 5
then putStrLn "Yes."
else putStrLn "No."
Expanding add
if
( if decrement 1 /= 0
then add
(increment (increment (increment 3)))
(decrement (decrement 1))
else increment (increment 3)
) == 5
then putStrLn "Yes."
else putStrLn "No."
Evaluating control flow decrement 1
if
( if 0 /= 0
then add
(increment (increment (increment 3)))
(decrement 0)
else increment (increment 3)
) == 5
then putStrLn "Yes."
else putStrLn "No."
Evaluating control flow 0 /= 0
if
( if False
then add
(increment (increment (increment 3)))
(decrement 0)
else increment (increment 3)
) == 5
then putStrLn "Yes."
else putStrLn "No."
Choosing the else
branch
if
(increment (increment 3)) == 5
then putStrLn "Yes."
else putStrLn "No."
Evaluate control flow increment (increment 3)
if
(increment 3 + 1) == 5
then putStrLn "Yes."
else putStrLn "No."
Evaluate in control flow increment 3
if
(3 + 1 + 1) == 5
then putStrLn "Yes."
else putStrLn "No."
Evaluate in control flow 3 + 1
if
(4 + 1) == 5
then putStrLn "Yes."
else putStrLn "No."
Evaluate in control flow 4 + 1
if
5 == 5
then putStrLn "Yes."
else putStrLn "No."
Evaluate in control flow 5 == 5
if
True
then putStrLn "Yes."
else putStrLn "No."
Choosing the then
branch
putStrLn "Yes."
Which when run will print Yes.
to the screen.
General recursion
In general, when trying to solve problems recursively, it is useful to think about the problem in three parts:
- Finding the base case (the most simple cases - the ones we already know how to answer)
- Figuring out how to reduce the problem to something simpler (so it gets closer to the base case)
- Mitigating the difference between the reduced version and the solution we need to provide
The reduce and mitigate steps together are usually called the recursive step.
Let's take a look at another example problem: generating a list of a particular size with a specific value in place of every element.
In Haskell, this function would have the following signature:
replicate :: Int -> a -> [a]
Here are a few usage examples of replicate
:
ghci> replicate 4 True
[True,True,True,True]
ghci> replicate 0 True
[]
ghci> replicate (-13) True
[]
How would we implement this function recursively? How would describe it in three steps above?
- Base case: the cases we already know how to generate are the cases where the length of the list is zero (or less) - we just return an empty list.
- Reduce: while we might not know how to generate a list of size
N
(whereN
is positive), if we knew the solution forN-1
we could: - Mitigate: Add another element to the solution for
N-1
using the:
(cons) operator.
Try to write this in Haskell!
Solution
replicate :: Int -> a -> [a]
replicate n x =
if n <= 0 -- recognizing the base case
then
[] -- the solution for the base case
else
x : replicate (n - 1) x
-- --- -------------------
-- ^ ^
-- | |
-- | +-------- reduction
-- |
-- +--- mitigation
Mutual recursion
When solving functions recursively we usually call the same function again, but that doesn't have to be the case. It is possible to reduce our problem to something simpler that requires an answer from a different function. If, in turn, that function will (or another function in that call chain) call our function again, we have a mutual recursive solution.
For example, let's write two functions, one that checks whether a natural number is even or not, and one that checks whether a number is odd or not only by decrementing it.
even :: Int -> Bool
odd :: Int -> Bool
Let's start with even
, how should we solve this recursively?
- Base case: We know the answer for
0
- it isTrue
. - Reduction: We might not know the answer for a general
N
, but we could check whetherN - 1
is odd, - Mitigation: if
N - 1
is odd, thenN
is even! if it isn't odd, thenN
isn't even.
What about odd
?
- Base case: We know the answer for
0
- it isFalse
. - Reduction: We might not know the answer for a general
N
, but we could check whetherN - 1
is even, - Mitigation: if
N - 1
is even, thenN
is odd! if it isn't even, thenN
isn't odd.
Try writing this in Haskell!
Solution
even :: Int -> Bool
even n =
if n == 0
then
True
else
odd (n - 1)
odd :: Int -> Bool
odd n =
if n == 0
then
False
else
even (n - 1)
Partial functions
because we didn't handle negative cases in the example above, our functions will loop forever when a negative value is passed as input. A function that does not return a result for some value (either by not terminating or by throwing an error) is called a partial function (because it only returns a result of a part of the possible inputs).
Partial functions are generally considered bad practice because they can have unexpected behaviour at runtime, so we want to avoid using partial functions as well as avoid writing partial functions.
The best way to avoid writing partial functions is by covering all inputs!
In the situation above, it is definitely possible to handle negative numbers
as well, so we should do that! Or, instead, we could require that our functions
accept a Natural
instead of an Int
, and then the type system would've stopped
us from using these functions with values that we did not handle.
There are cases where we can't possibly cover all inputs, in these cases it is important to re-examine the code and see if we could further restrict the inputs using types to mitigate these issues.
For example, the head :: [a] -> a
function from Prelude
promises
to return the first element (the head) of a list, but we know that lists
could possibly be empty, so how can this function deliver on its promise?
Unfortunately, it can't. But there exists a different function that can:
head :: NonEmpty a -> a
from the
Data.List.NonEmpty
module! The trick here is that this other head
does not take a general list
as input, it takes a different type entirely, one that promises to have
at least one element, and therefore can deliver on its promise!
We could also potentially use smart constructors with newtype
and enforce some sort
of restrictions in the type system, as we saw in earlier chapters.
But this solution can sometimes be less ergonomic to use.
An alternative approach is to use data
types to encode the absence of a proper result,
for example, using Maybe
, as we'll see in a future chapter.
Parsing markup?
Let's get back to the task at hand.
As stated previously, our strategy for parsing the markup text is:
- Split the string to a list where each element is a separate line
(which we can do with
lines
), and - Go over the list line by line and process it, remembering information from previous lines if necessary
Remember that we want to start by ignoring all of the markup syntax and just group lines together into paragraphs (paragraphs are separated by an empty line), and iteratively add new features later in the chapter:
parse :: String -> Document
parse = parseLines [] . lines -- (1)
parseLines :: [String] -> [String] -> Document
parseLines currentParagraph txts =
let
paragraph = Paragraph (unlines (reverse currentParagraph)) -- (2), (3)
in
case txts of -- (4)
[] -> [paragraph]
currentLine : rest ->
if trim currentLine == ""
then
paragraph : parseLines [] rest -- (5)
else
parseLines (currentLine : currentParagraph) rest -- (6)
trim :: String -> String
trim = unwords . words
Things to note:
-
We pass a list that contains the currently grouped paragraph (paragraphs are separated by an empty line)
-
Because of laziness,
paragraph
is not computed until it's needed, so we don't have to worry about the performance implications in the case the we are still grouping lines -
Why do we reverse
currentParagraph
? (See point (6)) -
We saw case expressions used to deconstruct
newtype
s andChar
s, but we can also pattern match on lists and other ADTs as well! In this case we match against two patterns, an empty list ([]
), and a "cons cell" - a list with at least one element (currentLine : rest
). In the body of the "cons" pattern, we bind the first element to the namecurrentLine
, and the rest of the elements to the namerest
.We will talk about how all of this works really soon!
-
When we run into an empty line we add the accumulated paragraph to the resulting list (A
Document
is a list of structures) and start the function again with the rest of the input. -
We pass the new lines to be grouped in a paragraph in reverse order because of performance characteristics - because of the nature of singly-linked lists, prepending an element is fast, and appending is slow. Prepending only requires us to create a new cons (
:
) cell to hold a pointer to the value and a pointer to the list, but appending requires us to traverse the list to its end and rebuild the cons cells - the last one will contain the last value of the list and a pointer to the list to append, the next will contain the value before the last value of the list and a pointer to the list which contains the last element and the appended list, and so on.
This code above will group together paragraphs in a structure, but how do we view our result? In the next chapter we will take a short detour and talk about type classes, and how they can help us in this scenario.
Displaying the parsing results (type classes)
We want to be able to print a textual representation of values
of our Document
type. There are a few ways to do that:
- Write our own function of type
Document -> String
which we could then print, or - Have Haskell write one for us
Haskell provides us with a mechanism that can automatically generate the implementation of a
type class function called show
, that will convert our type to String
.
The type of the function show
looks like this:
show :: Show a => a -> String
This is something new we haven't seen before. Between ::
and =>
you see what is called a type class constraint on the type a
. What
we say in this signature, is that the function show
can work on any
type that is a member of the type class Show
.
Type classes is a feature in Haskell that allows us to declare a common
interface for different types. In our case, Haskell's standard library
defines the type class Show
in the following way (this is a simplified
version but good enough for our purposes):
class Show a where
show :: a -> String
A type class declaration describes a common interface for Haskell types.
show
is an overloaded function that will work for any type that is an instance
of the type class Show
.
We can define an instance of a type class manually like this:
instance Show Bool where
show x =
case x of
True -> "True"
False -> "False"
Defining an instance means providing an implementation for the interface of a specific type.
When we call the function show
on a data type, the compiler will search the type's Show
instance,
and use the implementation provided in the instance declaration.
ghci> show True
"True"
ghci> show 187
"187"
ghci> show "Hello"
"\"Hello\""
As can be seen above, the show
function converts a value to its textual representation.
That is why "Hello"
includes the quotes as well. The Show
type class is usually
used for debugging purposes.
Deriving instances
It is also possible to automatically generate implementations of a few selected
type classes. Fortunately, Show
is one of them.
If all the types in the definition of our data type already implement
an instance of Show
, we can automatically derive it by adding deriving Show
at the
end of the data definition.
data Structure
= Heading Natural String
| Paragraph String
| UnorderedList [String]
| OrderedList [String]
| CodeBlock [String]
deriving Show
Now we can use the function show :: Show a => a -> String
for any
type that implements an instance of the Show
type class. For example, with print:
print :: Show a => a -> IO ()
print = putStrLn . show
We can first convert our type to String
and then write it to the
standard output.
And because lists also implement Show
for any element type that has
a Show
instance, we can now print Document
s, because they are just
aliases for [Structure]
. Try it!
There are many type classes Haskellers use everyday. A couple more are
Eq
for equality and Ord
for ordering. These are also special type classes
that can be derived automatically.
Laws
Type classes often come with "rules" or "laws" that instances should satisfy, the purpose of these laws is to provide predictable behaviour across instances, so that when we run into a new instance we can be confident that it will behave in an expected way, and we can write code that works generically for all instances of a type class while expecting them to adhere to these rules.
As an example, let's look at the Semigroup
type class:
class Semigroup a where
(<>) :: a -> a -> a
This type class provides a common interface for types with an operation <>
that can combine two values into one in some way.
This type class also mentions that this <>
operation should be associative,
meaning that these two sides should evaluate to the same result:
x <> (y <> z) = (x <> y) <> z
An example of a lawful instance of Semigroup
is lists with the append operation (++
):
instance Semigroup [a] where
(<>) = (++)
Unfortunately the Haskell type system cannot "prove" that instances satisfy these laws, but as a community we often shun unlawful instances.
Many data types (together with their respective operations) can
form a Semigroup
, and instances
don't even have to look similar or have a common analogy/metaphor
(and this is true for many other type classes as well).
Type classes are often just interfaces with laws (or expected behaviours if you will). Approaching them with this mindset can be very liberating!
To put it differently, type classes can be used to create abstractions - interfaces with laws/expected behaviours where we don't actually care about the concrete details of the underlying type, just that it implements a certain API and behaves in a certain way.
Regarding Semigroup
, we have previously
created a function that looks like <>
for our Html
EDSL!
We can add a Semigroup
instance for our Structure
data type
and have a nicer API!
Exercise: Please do this and remove the append_
function from the API.
Solution
Replace this:
append_ :: Structure -> Structure -> Structure
append_ c1 c2 =
Structure (getStructureString c1 <> getStructureString c2)
With this:
instance Semigroup Structure where
(<>) c1 c2 =
Structure (getStructureString c1 <> getStructureString c2)
And remove the export of append_
in Html.hs
. You won't need to further export anything
as type class instances are exported automatically.
You will also need to replace the usage of append_
with <>
in hello.hs
.
Parsing markup part 02 (Pattern matching)
Maybe
Previously on partial functions, we mentioned that one way to avoid
writing partial functions is to encode the absence of a result using Maybe
:
data Maybe a
= Nothing
| Just a
Maybe
is a data type from the standard library (named base)
for adding an additional value to a type: the absence of a value.
For example, Maybe Bool
has three values,
two with the Just
constructor to represent regular boolean values
(Just True
and Just False
) and another value, Nothing
to represent
the absence of a boolean value.
We can use this to encode the result of head
, a function that promises to return
the first element of a list, without creating a partial function:
safeHead :: [a] -> Maybe a
This way, when the list is empty, we can return Nothing
, and when it has at least
one element, we can return Just <first element>
. This function can be found in
the Data.Maybe
module under the name
listToMaybe.
In order to consume values of type Maybe <something>
, and other types created with
data
, we can use pattern matching.
Pattern Matching
We've already seen pattern matching a few times. It is an incredibly versatile feature of Haskell, we can use it to do two main things:
- Deconstruct complex values
- Control flow
As we've seen when discussing
newtypes,
we can use case expressions and function definitions to deconstruct a newtype
.
Same for data
types as well:
-- | A data type representing colors
data Color
= RGB Word8 Word8 Word8
getBluePart :: Color -> Word8
getBluePart color =
case color of
RGB _ _ blue -> blue
In getBluePart
we deconstruct a composite value into its part and extract the third component
representing the blue value in a color represented by red, green and blue components (RGB).
Note that blue
is the name we give to the third component so it will be bound
to the right of the arrow that comes after the pattern. This is similar to
a function argument. Also note that _
matches any value without binding it to a name.
We can also try to match a value with more than one pattern:
data Brightness
= Dark
| Bright
data EightColor
= Black
| Red
| Green
| Yellow
| Blue
| Magenta
| Cyan
| White
data AnsiColor
= AnsiColor Brightness EightColor
ansiColorToVGA :: AnsiColor -> Color
ansiColorToVGA ansicolor =
case ansicolor of
AnsiColor Dark Black ->
RGB 0 0 0
AnsiColor Bright Black ->
RGB 85 85 85
AnsiColor Dark Red ->
RGB 170 0 0
AnsiColor Bright Red ->
RGB 255 85 85
-- and so on
It's important to notice a few things here:
- Patterns can be nested, notice how we deconstructed
ansicolor
on multiple levels - We try to match patterns from the top down, it is possible for patterns to overlap with one another and the top one will win
- If the value we try to match does not match any of the patterns listed, an error will be thrown at runtime
We can ask GHC to notify us when we accidentally write overlapping patterns,
or when we haven't listed enough patterns to match all possible values,
by passing the flag -Wall
to ghc
or runghc
.
My recommendation is to always use -Wall
!
As an aside, while it is possible to use pattern matching in function definitions by defining a function multiple times, I personally don't like that feature very much and I would encourage you to avoid it, but if you want to use it instead of case expressions, it is possible.
Pattern matching on linked lists
Because linked lists have their own special syntax, we also have special syntax for their pattern match. We can use the same special syntax for creating lists when we pattern match on lists, replacing the elements of the list with patterns. For example:
safeHead :: [a] -> Maybe a
safeHead list =
case list of
-- Empty list
[] -> Nothing
-- Cons cell pattern, will match any list with at least one element
x : _ -> Just x
exactlyTwo :: [a] -> Maybe (a, a)
exactlyTwo list =
case list of
-- Will match a list with exactly two elements
[x, y] -> Just (x, y)
-- Will match any other pattern
_ -> Nothing
-- This will also work
exactlyTwoVersion2 :: [a] -> Maybe (a, a)
exactlyTwoVersion2 list =
case list of
-- Will match a list with exactly two elements
x : y : [] -> Just (x, y)
-- Will match any other pattern
_ -> Nothing
Exercises:
- Create a function
isBright :: AnsiColor -> Bool
that checks whether a color is bright - Use this table to write
ansiToUbuntu
. - Create a function
isEmpty :: [a] -> Bool
that useslistToMaybe
to check whether a list is empty - Create a function
isEmpty :: [a] -> Bool
that doesn't uselistToMaybe
to check whether a list is empty
Solutions:
Solution for (1)
isBright :: AnsiColor -> Bool
isBright ansiColor =
case ansiColor of
AnsiColor Bright _ -> True
AnsiColor Dark _ -> False
Solution for (2)
ansiToUbuntu :: AnsiColor -> Color
ansiToUbuntu ansiColor =
case ansiColor of
AnsiColor brightness color ->
case brightness of
Dark ->
case color of
Black -> RGB 0 0 0
Red -> RGB 194 54 33
Green -> RGB 37 188 36
Yellow -> RGB 173 173 39
Blue -> RGB 73 46 225
Magenta -> RGB 211 56 211
Cyan -> RGB 51 187 200
White -> RGB 203 204 205
Bright ->
case color of
Black -> RGB 129 131 131
Red -> RGB 252 57 31
Green -> RGB 49 231 34
Yellow -> RGB 234 236 35
Blue -> RGB 88 51 255
Magenta -> RGB 249 53 248
Cyan -> RGB 20 240 240
White -> RGB 233 235 235
Since pattern matching goes arbitrarily deep as we saw before, we could instead pattern match all the way through in one case expression:
ansiToUbuntu :: AnsiColor -> Color
ansiToUbuntu ansiColor =
case ansiColor of
AnsiColor Dark Black -> RGB 0 0 0
AnsiColor Dark Red -> RGB 194 54 33
AnsiColor Dark Green -> RGB 37 188 36
AnsiColor Dark Yellow -> RGB 173 173 39
AnsiColor Dark Blue -> RGB 73 46 225
AnsiColor Dark Magenta -> RGB 211 56 211
AnsiColor Dark Cyan -> RGB 51 187 200
AnsiColor Dark White -> RGB 203 204 205
AnsiColor Bright Black -> RGB 129 131 131
AnsiColor Bright Red -> RGB 252 57 31
AnsiColor Bright Green -> RGB 49 231 34
AnsiColor Bright Yellow -> RGB 234 236 35
AnsiColor Bright Blue -> RGB 88 51 255
AnsiColor Bright Magenta -> RGB 249 53 248
AnsiColor Bright Cyan -> RGB 20 240 240
AnsiColor Bright White -> RGB 233 235 235
But this is a bit too much repetition of AnsiColor
, Dark
and Bright
to my taste in this case.
Solution for (3)
isEmpty :: [a] -> Bool
isEmpty list =
case listToMaybe list of
Nothing -> True
Just _ -> False
Solution for (4)
isEmpty :: [a] -> Bool
isEmpty list =
case list of
[] -> True
_ : _ -> False
Parsing with rich context
Previously we wrote a parser that separates documents into different paragraphs. With new features under our belt we can now remember the exact context we are in (whether it is a text paragraph, a list, or a code block) and act accordingly!
Let's look again at the parsing code we wrote previously:
parse :: String -> Document
parse = parseLines [] . lines
parseLines :: [String] -> [String] -> Document
parseLines currentParagraph txts =
let
paragraph = Paragraph (unlines (reverse currentParagraph))
in
case txts of
[] -> [paragraph]
currentLine : rest ->
if trim currentLine == ""
then
paragraph : parseLines [] rest
else
parseLines (currentLine : currentParagraph) rest
trim :: String -> String
trim = unwords . words
Previously our context, currentParagraph
, was used to group adjacent lines in an accumulative list.
Next, instead of using a [String]
type to denote adjacent lines, we can instead use a Structure
to denote the context.
One issue we might have though with representing context with the Structure
type,
is that when we start parsing we don't have any context.
But we have learned of a way to represent the absence of a value with Maybe
! So our new context type can be Maybe Structure
instead.
Let's rewrite our code above with our new context type:
parse :: String -> Document
parse = parseLines Nothing . lines -- (1)
parseLines :: Maybe Structure -> [String] -> Document
parseLines context txts =
case txts of
[] -> maybeToList context -- (2)
-- Paragraph case
currentLine : rest ->
let
line = trim currentLine
in
if line == ""
then
maybe id (:) context (parseLines Nothing rest) -- (3)
else
case context of
Just (Paragraph paragraph) ->
parseLines (Just (Paragraph (unwords [paragraph, line]))) rest -- (4), (5)
_ ->
maybe id (:) context (parseLines (Just (Paragraph line)) rest)
trim :: String -> String
trim = unwords . words
-
We can now pass
Nothing
when we don't have a context -
Unsure what
maybeToList
does? Hoogle it! -
maybe is a function that works similarly to pattern matching on a
Maybe
: the third argument tomaybe
is the value we pattern match on, the second argument is a function to apply to the value found in aJust
case, and the first argument is the value to return in case the value we pattern match on isNothing
. This way to encode pattern matching using functions is actually fairly common.Check out the types of
id
,(:)
andmaybe id (:)
in GHCi! -
Hey! Didn't we say that appending
String
s/lists is slow (which is whatunwords
does)? Yes, it is. Because in ourStructure
data type, a paragraph is defined asParagraph String
and notParagraph [String]
, we can't use our trick of building a list of lines and then reverse it in the end.So what do we do? There are many ways to handle that, one simple way is to create a different type with the right shape:
data Context = CtxHeading Natural String | CtxParagraph [String] | CtxUnorderedList [String] | CtxOrderedList [String] | CtxCodeBlock [String]
Since creating new types in Haskell is cheap, this is a very viable solution.
In this case I'm going with the approach of not worrying about it too much, because it's a very local piece of code that can easily be fixed later if needed.
-
Anyway, if you've used
-Wall
like I've suggested, you'd get a warning from GHC saying that the "pattern matches are non-exhaustive". This is because we did not cover all cases. So let's cover more cases:
parse :: String -> Document
parse = parseLines Nothing . lines
parseLines :: Maybe Structure -> [String] -> Document
parseLines context txts =
case txts of
-- done case
[] -> maybeToList context
-- Heading 1 case
('*' : ' ' : line) : rest ->
maybe id (:) context (Heading 1 (trim line) : parseLines Nothing rest)
-- Unordered list case
('-' : ' ' : line) : rest ->
case context of
Just (UnorderedList list) ->
parseLines (Just (UnorderedList (list <> [trim line]))) rest
_ ->
maybe id (:) context (parseLines (Just (UnorderedList [trim line])) rest)
-- Paragraph case
currentLine : rest ->
let
line = trim currentLine
in
if line == ""
then
maybe id (:) context (parseLines Nothing rest)
else
case context of
Just (Paragraph paragraph) ->
parseLines (Just (Paragraph (unwords [paragraph, line]))) rest
_ ->
maybe id (:) context (parseLines (Just (Paragraph line)) rest)
trim :: String -> String
trim = unwords . words
Exercise: Add the CodeBlock
and OrderedList
cases.
Final module
-- Markup.hs
module Markup
( Document
, Structure(..)
, parse
)
where
import Numeric.Natural
import Data.Maybe (maybeToList)
type Document
= [Structure]
data Structure
= Heading Natural String
| Paragraph String
| UnorderedList [String]
| OrderedList [String]
| CodeBlock [String]
deriving (Eq, Show) -- (1)
parse :: String -> Document
parse = parseLines Nothing . lines
parseLines :: Maybe Structure -> [String] -> Document
parseLines context txts =
case txts of
-- done case
[] -> maybeToList context
-- Heading 1 case
('*' : ' ' : line) : rest ->
maybe id (:) context (Heading 1 (trim line) : parseLines Nothing rest)
-- Unordered list case
('-' : ' ' : line) : rest ->
case context of
Just (UnorderedList list) ->
parseLines (Just (UnorderedList (list <> [trim line]))) rest
_ ->
maybe id (:) context (parseLines (Just (UnorderedList [trim line])) rest)
-- Ordered list case
('#' : ' ' : line) : rest ->
case context of
Just (OrderedList list) ->
parseLines (Just (OrderedList (list <> [trim line]))) rest
_ ->
maybe id (:) context (parseLines (Just (OrderedList [trim line])) rest)
-- Code block case
('>' : ' ' : line) : rest ->
case context of
Just (CodeBlock code) ->
parseLines (Just (CodeBlock (code <> [line]))) rest
_ ->
maybe id (:) context (parseLines (Just (CodeBlock [line])) rest)
-- Paragraph case
currentLine : rest ->
let
line = trim currentLine
in
if line == ""
then
maybe id (:) context (parseLines Nothing rest)
else
case context of
Just (Paragraph paragraph) ->
parseLines (Just (Paragraph (unwords [paragraph, line]))) rest
_ ->
maybe id (:) context (parseLines (Just (Paragraph line)) rest)
trim :: String -> String
trim = unwords . words
How do we know our parser works correctly?
In an earlier chapter, we parsed a few examples of our markup language by hand.
Now, we can try to test our parser by comparing our solutions to our parser.
By deriving Eq
for our Structure
data type
(marked with (1) in "final module" above),
we can compare solutions with the ==
(equals) operator.
Try it in GHCi! You can read a text file in GHCi using the following syntax:
ghci> txt <- readFile "/tmp/sample.txt"
And then compare with the hand written example values from the solutions (after adding them to the module and loading them in GHCi):
ghci> parse txt == example4
In a later chapter, we'll write automated tests for our parser using a testing framework. But before that, I'd like to glue things together so we'll be able to:
- Read markup text from a file
- Parse the text
- Convert the result to our HTML EDSL
- Generate HTML code
And also discuss how to work with IO in Haskell while we're at it.
You can view the git commit of the changes we've made and the code up until now.
Gluing things together
In this chapter we are going to glue the pieces that we built together and build an actual blog generator. We will:
- Read markup text from a file
- Parse the text to a
Document
- Convert the result to our
Html
EDSL - Generate HTML code
- Write it to file
While doing so, we will learn:
- How to work with IO
- How to import external libraries to process whole directories and create a simple command-line interface
Converting Markup to HTML
One key part is missing before we can do glue everything together, and that is
to convert our Markup
data types to Html
.
We'll start by creating a new module and import both the Markup
and the Html
modules.
module Convert where
import qualified Markup
import qualified Html
Qualified Imports
This time, we've imported the modules qualified. Qualified imports means that instead of exposing the names that we've defined in the imported module to the general module name space, they now have to be prefixed with the module name.
For example, parse
becomes Markup.parse
.
If we would've imported Html.Internal
qualified, we'd have to write
Html.Internal.el
which is a bit long.
We can also give the module a new name with the as
keyword:
import qualified Html.Internal as HI
And write HI.el
instead.
I like using qualified imports because readers do not have to guess where a
name comes from. Some modules are even designed to be imported qualified.
For example, many container APIs such as maps, sets, and vectors have very similar
API. If we want to use multiple containers in a single module we pretty much have
to use qualified imports so that when we write a function such as singleton
,
which creates a container with a single value, GHC will know which singleton
function we are referring to.
Some people prefer to use import lists instead of qualified imports, because qualified names can be a bit verbose and noisy. I will often prefer qualified imports to import lists, but feel free to try both solutions and see which fits you better. For more information about imports, see this wiki article.
Converting Markup.Structure
to Html.Structure
Converting a markup structure to an HTML structure is mostly straightforward at this point, we need to pattern match on the markup structure and use the relevant HTML API.
convertStructure :: Markup.Structure -> Html.Structure
convertStructure structure =
case structure of
Markup.Heading 1 txt ->
Html.h1_ txt
Markup.Paragraph p ->
Html.p_ p
Markup.UnorderedList list ->
Html.ul_ $ map Html.p_ list
Markup.OrderedList list ->
Html.ol_ $ map Html.p_ list
Markup.CodeBlock list ->
Html.code_ (unlines list)
Notice that running this code with -Wall
will reveal that the pattern matching
is non-exhaustive. This is because we don't currently have a way to build
headings that are not h1
. There are a few ways to handle this:
- Ignore the warning - this will likely fail at runtime one day and the user will be sad
- Pattern match other cases and add a nice error with the
error
function - it has the same disadvantage above, but will also no longer notify of the unhandled cases at compile time. - Pattern match and do the wrong thing - user is still sad
- Encode errors in the type system using
Either
, we'll see how to do this in later chapters - Restrict the input - change
Markup.Heading
to not include a number but rather specific supported headings. This is a reasonable approach. - Implement an HTML function supporting arbitrary headings. Should be straightforward to do.
Exercise: Implement h_ :: Natural -> String -> Structure
which we'll use to define arbitrary headings (such as <h1>
, <h2>
, and so on).
Solution
import Numeric.Natural
h_ :: Natural -> String -> Structure
h_ n = Structure . el ("h" <> show n) . escape
Don't forget to export it from Html.hs
!
Exercise: Fix convertStructure
using h_
.
Solution
convertStructure :: Markup.Structure -> Html.Structure
convertStructure structure =
case structure of
Markup.Heading n txt ->
Html.h_ n txt
Markup.Paragraph p ->
Html.p_ p
Markup.UnorderedList list ->
Html.ul_ $ map Html.p_ list
Markup.OrderedList list ->
Html.ol_ $ map Html.p_ list
Markup.CodeBlock list ->
Html.code_ (unlines list)
Document -> Html
In order to create an Html
document, we need to use the html_
function.
This function expects two things: a Title
, and a Structure
.
For a title we could just supply it from outside using the file name.
In order to convert our markup Document
(which is a list of markup Structure
)
to an HTML Structure
, we need to convert each markup Structure
and then
concatenate them together.
We already know how to convert each markup Structure
, we can use the
convertStructure
function we wrote and map
. This will provide
us with the following function:
map convertStructure :: Markup.Document -> [Html.Structure]
To concatenate all of the Html.Structure
, we could try to write a recursive
function. However we will quickly run into an issue
with the base case, what to do when the list is empty?
We could just provide dummy Html.Structure
that represents an empty
HTML structure.
Let's add this to Html.Internal
:
empty_ :: Structure
empty_ = Structure ""
Now we can write our recursive function. Try it!
Solution
concatStructure :: [Structure] -> Structure
concatStructure list =
case list of
[] -> empty_
x : xs -> x <> concatStructure xs
Remember the <>
function we implemented as an instance of the Semigroup
type class? We mentioned that Semigroup
is an abstraction for things
that implements (<>) :: a -> a -> a
, where <>
is associative
(a <> (b <> c) = (a <> b) <> c
).
It turns out that having an instance of Semigroup
and also having a value that represents
an "empty" value is a fairly common pattern. For example a string can be concatenated,
and the empty string can serve as an "empty" value.
And this is actually a well known abstraction called monoid.
Monoids
Actually, "empty" isn't a very good description of what we want, and isn't very useful as an abstraction. Instead, we can describe it as an "identity" element, which satisfies the following laws:
x <> <identity> = x
<identity> <> x = x
In other words, if we try to use this "empty" - this identity value,
as one argument to <>
, we will always get the other argument back.
For String
, the empty string, ""
, satisfies this:
"" <> "world" = "world"
"hello" <> "" = "hello"
This is of course true for any value we'd write and not just "world" and "hello".
Actually, if we move out of the Haskell world for a second, even integers
with +
as the associative binary operations +
(in place of <>
)
and 0
in place of the identity member form a monoid:
17 + 0 = 17
0 + 99 = 99
So integers together with the +
operation form a semigroup, and
together with 0
form a monoid.
We learn new things from this:
- A monoid is a more specific abstraction over semigroup, it builds on it by adding a new condition (the existence of an identity member)
- This abstraction can be useful! We can write a general
concatStructure
that could work for any monoid
And indeed, there exists a type class in base
called Monoid
which has
Semigroup
as a super class.
class Semigroup a => Monoid a where
mempty :: a
Note: this is actually a simplified version. The actual is a bit more complicated because of backwards compatibility and performance reasons.
Semigroup
was actually introduced in Haskell afterMonoid
!
We could add an instance of Monoid
for our markup Structure
data type:
instance Monoid Structure where
mempty = empty_
And now, instead of using our own concatStructure
, we can use the library function
mconcat :: Monoid a => [a] -> a
Which could theoretically be implemented as:
mconcat :: Monoid a => [a] -> a
mconcat list =
case list of
[] -> mempty
x : xs -> x <> mconcat xs
Notice that because Semigroup
is a super class of Monoid
,
we can still use the <>
function from the Semigroup
class
without adding the Semigroup a
constraint to the left side of =>
.
By adding the Monoid a
constraint we implicitly add a Semigroup a
constraint as well!
This mconcat
function is very similar to the concatStructure
function,
but this one works for any Monoid
, including Structure
!
Abstractions help us identify common patterns and reuse code!
Side note: integers with
+
and0
aren't actually an instance ofMonoid
in Haskell. This is because integers can also form a monoid with*
and1
! But there can only be one instance per type. Instead, two othernewtype
s exist that provide that functionality, Sum and Product. See how they can be used inghci
:ghci> import Data.Monoid ghci> Product 2 <> Product 3 -- note, Product is a data constructor Product {getProduct = 6} ghci> getProduct (Product 2 <> Product 3) 6 ghci> getProduct $ mconcat $ map Product [1..5] 120
Another abstraction?
We've used map
and then mconcat
twice now. Surely there has to be a function
that unifies this pattern. And indeed, it is called
foldMap
,
and it works not only for lists, but also for any data structure that can be "folded",
or "reduced", into a summary value. This abstraction and type class is called Foldable.
For a simpler understanding of Foldable
, we can look at fold
:
fold :: (Foldable t, Monoid m) => t m -> m
-- compare with
mconcat :: Monoid m => [m] -> m
mconcat
is just a specialized version of fold
for lists.
And fold
can be a used for any pair of a data structure that implements
Foldable
and a payload type that implements Monoid
. This
could be []
with Structure
, or Maybe
with Product Int
, or
your new shiny binary tree with String
as the payload type. But note that
the Foldable
type must be of kind * -> *
. So for example Html
cannot be a Foldable
.
foldMap
is a function that allows us to apply a function to the
payload type of the Foldable
type right before combining them
with the <>
function.
foldMap :: (Foldable t, Monoid m) -> (a -> m) -> t a -> m
-- compare to a specialized version with:
-- - t ~ []
-- - m ~ Html.Structure
-- - a ~ Markup.Structure
foldMap
:: (Markup.Structure -> Html.Structure)
-> [Markup.Structure]
-> Html.Structure
True to its name, it really "maps" before it "folds". You might pause here
and think "this 'map' we are talking about isn't specific for lists, maybe
that's another abstraction?", yes. It is actually a very important and
fundamental abstraction called Functor
.
But I think we had enough abstractions for this chapter.
We'll cover it in a later chapter!
Finishing our conversion module
Let's finish our code by writing convert
:
convert :: Html.Title -> Markup.Document -> Html.Html
convert title = Html.html_ title . foldMap convertStructure
Now we have a full implementation and are able to convert markup documents to HTML:
-- Convert.hs
module Convert where
import qualified Markup
import qualified Html
convert :: Html.Title -> Markup.Document -> Html.Html
convert title = Html.html_ title . foldMap convertStructure
convertStructure :: Markup.Structure -> Html.Structure
convertStructure structure =
case structure of
Markup.Heading n txt ->
Html.h_ n txt
Markup.Paragraph p ->
Html.p_ p
Markup.UnorderedList list ->
Html.ul_ $ map Html.p_ list
Markup.OrderedList list ->
Html.ol_ $ map Html.p_ list
Markup.CodeBlock list ->
Html.code_ (unlines list)
Summary
We learned about:
- Qualified imports
- Ways to handle errors
- The
Monoid
type class and abstraction - The
Foldable
type class and abstraction
Next, we are going to glue our functionality together and learn about I/O in Haskell!
You can view the git commit of the changes we've made and the code up until now.
Working with IO
In previous chapters we were able to build a parser from a text string to a Haskell representation of our markup language, and we built an EDSL for easy writing of HTML code. However, our program is still not useful to other users because we did not make this functionality accessible via some sort of user interface.
In our program, we'd like to take user input and then convert it to HTML. There are many ways to design this kind of interface, for example:
- Get text input via the standard input and output HTML via the standard output
- Receive two file names as command-line arguments, read the contents of the first one, and write the output to the second one
- Ask for fancier command-line arguments parsing and prefix the file names with flags indicating what they are
- Some fancy GUI interface
- Combination of all of the above
To make this interesting, we will start with the following interface:
- If the user calls the program without arguments, we will read from the standard input, and write to the standard output
- If the user calls the program with two arguments, the first one will be the input file name, and the second one will be the output file name
- If the output file already exists, we'll ask the user if they want to overwrite the file
- On any other kind of input, we'll print a generic message explaining the proper usage
In a later chapter, we will add a fancier command-line interface using a library, and also read whole directories in addition to single files.
But first, we need to learn about I/O in Haskell, what makes it special, and why it's different from other programming languages.
Purely functional
Originally, Haskell was designed as an open standard functional programming language with non-strict semantics, to serve as a unifying language for future research in functional language design.
In GHC Haskell, we use a lazy evaluation strategy to implement non-strict semantics (We've talked about laziness before).
The requirement for non-strict semantics raises interesting challenges: How do we design a language that can do more than just evaluate expressions, how do we model interaction with the outside world, how do we do I/O?
The challenge with doing I/O operations in a language with a lazy evaluation strategy is that as programs grow larger, the order of evaluation becomes less trivial to figure out. Consider this hypothetical code example (which won't actually type-check in Haskell, we'll see why soon):
addWithInput :: Int -> Int
addWithInput n = readIntFromStdin + n
main =
let
result1 = addWithInput 1
result2 = addWithInput 2
in
print (result2 - result1)
This hypothetical program will read 2 integers from the standard input, and then will subtract the second one (+2) from the first one (+1), or so we would expect if this was a strict language. In a strict language we expect the order of operations to happen from the top-down.
But in a lazy language we don't evaluate an expression until
it is needed, and so neither result1
nor result2
are evaluated
until we wish to print the result of subtracting one from the other.
And then when we try to evaluate -
, it evaluates the two arguments
from left to right, so we first evaluate result2
.
Evaluating result2
, with substitution, means to replace occurrences of n
with the input 2
, and then evaluate the top level function (+
), which is a
primitive function. We then evaluate its arguments, readIntFromStdin
and then n
; at this point we are reading the first integer from the stdin.
After calculating the result, we can move to evaluate result1
, which
will read the second integer from stdin. This is the
complete opposite of what we wanted!
Issues like these make lazy evaluation hard to work with in the presence of side effects - when the evaluation of an expression can affect or be affected by the outside world, this includes reading/writing from mutable memory or performing I/O operations.
We call functions that has side-effects such as addWithInput
impure functions.
And an unfortunate consequence of impure functions is that
they can return different results even when they take the same input.
The presence of impure functions makes it harder for us to reason about lazy evaluation, and also messes up our ability to use equational reasoning to understand programs.
Therefore, in Haskell, it was decided to only allow pure functions and expressions - ones that have no side effects. Pure functions will always return the same output (given the same input) and evaluating pure expressions is deterministic.
But now, how can we do I/O operations? There are many possible solutions
- for Haskell it was decided to design a first class interface
with an accompanied type called
IO
.IO
's interface will force a distinction from non-I/O expressions, and will also require that in order to combine multipleIO
operations, we will have to specify the order the operations.
IO
IO
is an opaque type, like our Html
type in which we hide its internal
representation from the user behind an interface. But in this case IO
is a
built-in type that is hidden by the Haskell language rather than a module.
Similar to Maybe
, IO
has a payload type which represents the
result of an IO
operation/action/computation.
When there isn't a meaningful result, we use the unit type,
()
(which only has one value: ()
) to represent that.
Here are a few IO
operations and functions that return IO
operations:
putStrLn :: String -> IO ()
getLine :: IO String
getArgs :: IO [String]
lookupEnv :: String -> IO (Maybe String)
writeFile :: FilePath -> String -> IO ()
Notice that each function returns an IO <something>
, but what does that mean?
The meaning behind IO a
is that it is a description of a program (or subroutine)
that when executed will produce some value of type a
, and may do some I/O effects
during execution.
Executing an IO a
is different from evaluating it.
Evaluating an IO a
expression is pure - the evaluation will always reduce to
the same description of a program. This helps us keep purity and equational reasoning!
The Haskell runtime will execute the entry point to the program
(the expression main
, that must have the type IO ()
) in order for our IO operation
to also run - it has to be combined into the main
expression - let's see what that means.
Combining IO expressions
Just like our Html.Structure
type, the IO interface provides combinators for composing
small IO
operations into bigger ones. This interface also makes sure that the order
of operations is well defined!
Note that, just like the <>
we've defined for Html.Structure
, the combinators for IO
are implemented as type-class instances rather than specialized variants
(for example our append_
function was a specialized version of <>
tailored only
for Structure
).
In this section I will introduce specialized type signatures rather than generalized ones, because I think it'll be easier to digest, but we'll talk about the generalized versions later.
>>=
Our first combinator is >>=
(pronounced bind), and is the most useful of the bunch:
(>>=) :: IO a -> (a -> IO b) -> IO b
This combinator takes two arguments, the first is an IO operation, and the second is
a function that takes as input the result of the first IO operation and returns
a new IO b
which is the final result.
Here are a few examples using the functions we described above:
-
Echo
getLine >>= (\line -> putStrLn line)
We are reading a line from the standard input on the left of
>>=
, we receive the input to the right of>>=
as an argument to the lambda function, and then write it to the standard output in the body of the lambda function.>>=
's role here is to pass the result of the IO operation on the left to the function returning an IO operation on the right.Notice how
>>=
defines an order of operations - from left to right.The type of each sub expression here is:
getLine :: IO String putStrLn :: String -> IO () (>>=) :: IO String -> (String -> IO ()) -> IO () line :: String
- Question: what is the type of the whole expression?
Answer
IO ()
Also note that this example can be written in a more concise manner in point free style
getLine >>= putStrLn
. - Question: what is the type of the whole expression?
-
Appending two inputs
getLine >>= (\honorific -> getLine >>= (\name -> putStrLn ("Hello " ++ honorific ++ " " ++ name)))
This subroutine combines multiple operations together, it reads two lines from the standard input and prints a greeting. Note that:
- Using
>>=
defines the order of operation from left to right - Because of the scoping rules in Haskell,
honorific
will be in scope in the body of the function for which it is its input, including the most inner function
This is a bit hard to read, but we can remove the parenthesis and add indentation to make it a bit easier to read:
getLine >>= \honorific -> getLine >>= \name -> putStrLn ("Hello " ++ honorific ++ " " ++ name)
- Using
Let's see a few more combinators!
*> and >>
(*>) :: IO a -> IO b -> IO b
(>>) :: IO a -> IO b -> IO b
*>
and >>
have the same type signature for IO
and mean the same thing.
In fact, *>
is a slightly more generalized version of >>
and can always
be used instead of >>
, which only still exists to avoid breaking backward
compatibility.
*>
for IO
means run the first IO operation, discard the result
then run the second operation. It can be implemented using >>=
:
a *> b = a >>= \_ -> b
This combinator is useful when we want to run several IO
operations one after
the other that might not return anything meaningful, such as putStrLn
:
putStrLn "hello" *> putStrLn "world"
pure and return
pure :: a -> IO a
like *>
and >>
, pure
is a more general version of return
. pure
also has the
advantage of not having a resemblance to an unrelated keyword in other languages.
Remember that we said IO a
is description of a program
that when executed will produce some value of type a
and may do some I/O effects
during execution?
With pure
, we can build an IO a
that does no I/O, and will produce a
specific value of type a
, the one we supply to pure
!
This function is useful when we want to do some uneffectful computation that depends on IO
.
For example:
confirm :: IO Bool
confirm =
putStrLn "Are you sure? (y/n)" *>
getLine >>= \answer ->
case answer of
"y" -> pure True
"n" -> pure False
_ ->
putStrLn "Invalid response. use y or n" *>
confirm
Trying to return just True
or False
here wouldn't work because of the
type of >>=
:
(>>=) :: IO a -> (a -> IO b) -> IO b
The right side of >>=
in our code example (\answer -> case ...
) must
be of type String -> IO Bool
. This is because:
getLine :: IO String
, so thea
in the type signature of>>=
should be the same asString
in this instance, andconfirm :: IO Bool
, sob
should beBool
fmap and <$>
fmap :: (a -> b) -> IO a -> IO b
<$>
is the infix version of fmap
. Use it at your discretion.
What if we want a function that reads a line from stdin
and returns it with !
at the end? We could use a combination
of >>=
and pure
:
getLine >>= \line -> pure (line ++ "!")
The pattern is unified to the fmap
function:
fmap (\line -> line ++ "!") getLine
fmap
applies a function to the value to be returned
from the IO
operation, also known as "mapping" over it.
(By the way, Have you noticed the similarities between fmap
and map :: (a -> b) -> [a] -> [b]
?)
Summary
Here's a list of IO
combinators we ran into:
-- chaining IO operations: passing the *result* of the left IO operation
-- as an argument to the function on the right.
-- Pronounced "bind".
(>>=) :: IO a -> (a -> IO b) -> IO b
-- sequence two IO operations, discarding the payload of the first.
(*>) :: IO a -> IO b -> IO b
-- "lift" a value into IO context, does not add any I/O effects.
pure :: a -> IO a
-- "map" (or apply a function) over the payload value of an IO operation.
fmap :: (a -> b) -> IO a -> IO b
IO is first class
The beauty of IO
is that it's a completely first-class construct in the language,
and is not really different from Maybe
, Either
or Structure
. We can pass it to
functions, put it in a container, etc. Remember that it represents a description
of a program, and without combining it into main
in some way won't actually
do anything. It is just a value!
Here's an example of a function that takes IO actions as input:
whenIO :: IO Bool -> IO () -> IO ()
whenIO cond action =
cond >>= \result ->
if result
then action
else pure ()
And how it can be used:
main :: IO ()
main =
putStrLn "This program will tell you a secret" *>
whenIO confirm (putStrLn "IO is actually pretty awesome") *>
putStrLn "Bye"
Notice how putStrLn "IO is actually pretty awesome"
isn't executed
right away, but only if it is what whenIO
returns, and in turn is combined
with *>
as part of the main
expression.
Getting out of IO?
What we've seen above has great consequences to the Haskell language.
In our Html
type, we had a function render :: Html -> String
that could turn an Html
to a string value.
In Haskell, there is no way to implement a function such as execute :: IO a -> a
that preserves purity and equational reasoning!
Also, IO
is opaque. It does not let us examine it. So we are really bound
to what the Haskell API for IO
allows us to do.
This means that we need to think about using IO differently!
In Haskell, once we get into IO
, there is no getting out.
The only thing we can do is to build bigger IO computations by combining
it with more IO computations.
We also can't use IO a
in place of an a
. For example,
we can't write getLine ++ "!"
because ++
expects both
sides to be String
, but getLine
's type is IO String
. The types do not match!
We have to use fmap
and the return type must be IO String
, like we've seen before.
In Haskell we like to keep IO
usage minimal, and we like to push it to the edges
of the program. This pattern is often called Functional core, imperative shell.
Functional core, imperative shell
In our blog generator program, we want to read a file, parse it, convert it to HTML, and then print the result to the console.
In many programming languages, we might interleave reading from the file with parsing,
and writing to the file with the HTML conversion. But we don't mix these here.
Parsing operates on a String
value rather than some file handle,
and Html
is being converted to a String
rather than being written to the screen directly.
This approach of separating IO
and pushing it to the edge of the program gives us
a lot of flexibility. These functions without IO
are easier to test and examine
(because they are guaranteed to have deterministic evaluation!),
and they are more modular and can work in many contexts (reading from stdin,
reading from network socket, writing to an HTTP connection, and more).
This pattern is often a good approach for building Haskell programs, especially batch programs.
Building a blog generator
We'd like to start building a blog generator, and we want to have the following interface:
- If the user calls the program without arguments, we will read from the standard input, and write to the standard output
- If the user calls the program with two arguments, the first one will be the input file name, and the second one will be the output file name
- If the output file already exists, we'll ask the user if they want to overwrite the file
- On any other kind of input, we'll print a generic message explaining the proper usage
We are going to need a few functions:
getArgs :: IO [String] -- Get the program arguments
getContents :: IO String -- Read all of the content from stdin
readFile :: FilePath -> IO String -- Read all of the content from a file
writeFile :: FilePath -> String -> IO () -- Write a string into a file
doesFileExist :: FilePath -> IO Bool -- Checks whether a file exists
And the following imports:
import System.Directory (doesFileExist)
import System.Environment (getArgs)
We don't need to add the following import because Prelude
already imports
these functions for us:
-- imported by Prelude
import System.IO (getContents, readFile, writeFile)
-
Implement a function
process :: Title -> String -> String
which will parse a document to markup, convert it to HTML, and then render the HTML to a string.Answer
process :: Html.Title -> String -> String process title = Html.render . convert title . Markup.parse
-
Try implementing the "imperative shell" for our blog generator program. Start with
main
, pattern match on the result ofgetArgs
, and decide what to do. Look back at the examples above for inspiration.Answer
-- Main.hs module Main where import qualified Markup import qualified Html import Convert (convert) import System.Directory (doesFileExist) import System.Environment (getArgs) main :: IO () main = getArgs >>= \args -> case args of -- No program arguments: reading from stdin and writing to stdout [] -> getContents >>= \content -> putStrLn (process "Empty title" content) -- With input and output file paths as program arguments [input, output] -> readFile input >>= \content -> doesFileExist output >>= \exists -> let writeResult = writeFile output (process input content) in if exists then whenIO confirm writeResult else writeResult -- Any other kind of program arguments _ -> putStrLn "Usage: runghc Main.hs [-- <input-file> <output-file>]" process :: Html.Title -> String -> String process title = Html.render . convert title . Markup.parse confirm :: IO Bool confirm = putStrLn "Are you sure? (y/n)" *> getLine >>= \answer -> case answer of "y" -> pure True "n" -> pure False _ -> putStrLn "Invalid response. use y or n" *> confirm whenIO :: IO Bool -> IO () -> IO () whenIO cond action = cond >>= \result -> if result then action else pure ()
Do notation
While using >>=
to chain IO
actions is manageable, Haskell provides
an even more convenient syntactic sugar called do notation
which emulates imperative programming.
A do block starts with the do
keyword, and continues with one or more
"statements" which can be one of the following:
- An expression of type
IO ()
, such as:putStrLn "Hello"
if True then putStrLn "Yes" else putStrLn "No"
- A let block, such as
let x = 1
- or multiple let declarations:
Note that we do not write thelet x = 1 y = 2
in
here.
- A binding
<variable> <- <expresion>
, such asline <- getLine
And the last "statement" must be an expression of type IO <something>
-
this will be the result type of the do block.
These constructs are desugared (translated) by the Haskell compiler to:
<expression> *>
,let ... in
and<expression> >>= \<variable>
respectively.
For example:
greeting :: IO ()
greeting = do
putStrLn "Tell me your name."
let greet name = "Hello, " ++ name ++ "!"
name <- getLine
putStrLn (greet name)
Is just syntactic sugar for:
greeting :: IO ()
greeting =
putStrLn "Tell me your name." *>
let
greet name = "Hello, " ++ name ++ "!"
in
getLine >>= \name ->
putStrLn (greet name)
It's important to note the difference between let
and <-
(bind).
let
is used to give a new name to an expression which will be in scope
for subsequent lines, and <-
is used to bind the result a
in an IO a
action to a new name which will be in scope for subsequent lines.
code | operator | type of the left side | type of the right side | comment |
---|---|---|---|---|
let gretting = "hello" |
= |
String |
String |
Both sides are interchangeable |
let mygetline = getLine |
= |
IO String |
IO String |
We just create a new name for getLine |
name <- getLine |
<- |
String |
IO String |
Technically <- is not an operator, but just a syntactic sugar for >>= + lambda, where we bind the result of the computation to a variable |
Do notation is very very common and is often preferable to using >>=
directly.
-
Exercise: Translate the examples in this chapter to do notation.
-
Exercise: Translate our glue code for the blog generator to do notation.
Solution
-- Main.hs module Main where import qualified Markup import qualified Html import Convert (convert) import System.Directory (doesFileExist) import System.Environment (getArgs) main :: IO () main = do args <- getArgs case args of -- No program arguments: reading from stdin and writing to stdout [] -> do content <- getContents putStrLn (process "Empty title" content) -- With input and output file paths as program arguments [input, output] -> do content <- readFile input exists <- doesFileExist output let writeResult = writeFile output (process input content) if exists then whenIO confirm writeResult else writeResult -- Any other kind of program arguments _ -> putStrLn "Usage: runghc Main.hs [-- <input-file> <output-file>]" process :: Html.Title -> String -> String process title = Html.render . convert title . Markup.parse confirm :: IO Bool confirm = do putStrLn "Are you sure? (y/n)" answer <- getLine case answer of "y" -> pure True "n" -> pure False _ -> do putStrLn "Invalid response. use y or n" confirm whenIO :: IO Bool -> IO () -> IO () whenIO cond action = do result <- cond if result then action else pure ()
Summary
In this chapter we discussed what "purely functional" means, where the initial motivation for being purely functional comes from, and how Haskell's I/O interface allows us to create descriptions of programs without breaking purity.
We have also achieved a major milestone. With this chapter, we have implemented enough pieces to finally run our program on a single document and get an HTML rendered result!
However, our command-line interface is still sub-par. We want to render a blog with multiple articles, create an index page, and more. We still have more to do to be able to call our program a blog generator.
Let's keep going!
You can view the git commit of the changes we've made and the code up until now.
Defining a project description
Up until now we've only used base
and the libraries
included
with GHC. Because of that we didn't really need to do anything fancier
than runghc
to run our program. However, we want to start using
external libraries which are not included with GHC in our programs.
External packages can be downloaded from Hackage - Haskell's central package archive, Stackage - a subset of Hackage packages that are known to work together, or even from remote git repositories. Usually Haskellers use a package manager to download and manage packages for different projects. The most popular package managers for Haskell are cabal and stack.
A major difference between the two is their philosophy.
cabal
tries to be a more minimalist tool that handles building Haskell projects,
doing package management using the whole of Hackage, and uses complicated algorithms
to make sure packages work together.
stack
tries to be a more maximalistic tool that handles installing the right GHC
for each project, provides integration with external tools like hoogle,
and lets the user choose which 'set' of packages (including their versions) they want to use.
If you've installed Haskell using GHCup, you most likely have cabal
installed.
If you've installed Haskell using stack, well, you have stack
installed.
Check the haskell.org downloads page if that's not the case.
Creating a project
Using external packages can be done in multiple ways. For quick experimentation, we can just ask stack or cabal to build or even run our program with external packages. But as programs get larger, use more dependencies, and require more functionality, it is better to create a project description for our programs and libraries.
Project description is done in a cabal file. We can ask cabal or stack
to generate one for us using cabal init --libandexe
or stack new
,
along with many other files, but we will likely need to edit the file by hand
later. For now let's just paste an initial example in hs-blog.cabal
and edit it.
cabal-version: 2.4
name: name should match with <name>.cabal
version: version should use PvP
synopsis: Synopsis will appear in the hackage package listing and search
description: The description will appear at the top of a library
homepage: Homepage url
bug-reports: issue-tracker url
license: License name
license-file: License file
author: Author name
maintainer: Maintainer email
category: Hackage categories, separated by commas
extra-doc-files:
README.md
common common-settings
default-language: Haskell2010
ghc-options:
-Wall
library
import: common-settings
hs-source-dirs: src
build-depends:
base
, directory
exposed-modules:
HsBlog
HsBlog.Convert
HsBlog.Html
HsBlog.Html.Internal
HsBlog.Markup
-- other-modules:
executable hs-blog-gen
import: common-settings
hs-source-dirs: app
main-is: Main.hs
build-depends:
base
, <package-name>
ghc-options:
-O
Let's break it down to a few parts, the package metadata, common settings, library and executable.
Package metadata
The first part should be fairly straightforward from the comments, maybe except for:
cabal-version
: Defines which cabal versions can build this project. We've specified 2.4 and above. More info on different versions.name
: The name of your library and package. Must match with the .cabal filename. Usually starts with a lowercase. Check if your package name is already taken on Hackage.version
: Some Haskell packages use semver, most use PvP.license
: Most Haskell packages use BSD-3-Clause. Neil Mitchell blogged about this. You can find more licenses if you'd like at choosealicense.com.extra-doc-files
: Include extra doc files here, such asREADME
orCHANGELOG
.
Let's fill this with the metadata of our project:
cabal-version: 2.4
name: hs-blog
version: 0.1.0.0
synopsis: A custom blog generator from markup files
description: This package provides a static blog generator
from a custom markup format to HTML.
It defines a parser for this custom markup format
as well as an html pretty printer EDSL.
It is used as the example project in the online book
'Learn Haskell Blog Generator'. See the README for
more details.
homepage: https://github.com/soupi/learn-haskell-blog-generator
bug-reports: https://github.com/soupi/learn-haskell-blog-generator/issues
license: BSD-3-Clause
license-file: LICENSE.txt
author: Gil Mizrahi
maintainer: gilmi@posteo.net
category: Learning, Web
extra-doc-files:
README.md
Common settings
Cabal package descriptions can include multiple "targets": libraries, executables, and test suites. Since Cabal 2.2, we can use common stanzas to group settings to be shared between different targets, so we don't have to repeat them for each target.
In our case we've created a new common stanza (or block) called common-settings
and
defined the default language (Haskell has two standards, 98 and 2010),
and instructed GHC to compile with -Wall
.
common common-settings
default-language: Haskell2010
ghc-options:
-Wall
Later, in our targets' descriptions, we can add import: common-settings
,
and all of these settings will be automatically added.
Library
In a library
target, we define:
- The settings with which to build the library (in this case we just import
common-settings
) - The directory in which the source files can be found
- The packages we require to build the library
- The modules exposed from the library and can be used by others
- The modules not exposed from the library and which cannot be used by others;
these could be any module you don't wish to export, such as an internal utility
functions module.
In our case we don't have anything like this, so we commented out the
other-modules
label.
Note that it is common to specify version bounds for packages.
Version bounds specify which package versions this library works with.
These can also be generated using cabal with the cabal gen-bounds
command.
library
import: common-settings
hs-source-dirs: src
build-depends:
base
, directory
exposed-modules:
HsBlog
HsBlog.Convert
HsBlog.Html
HsBlog.Html.Internal
HsBlog.Markup
-- other-modules:
Also note that we've added an additional hierarchy for our modules and defined
a different source directory. This means we will need to move the files around
a bit and change the module
name in each file and the import
statements. This is to avoid
conflict with other packages that a user might import.
Do this now.
Solution
-
Main.hs
->src/HsBlog.hs
module HsBlog ( main , process ) where import qualified HsBlog.Markup as Markup import qualified HsBlog.Html as Html import HsBlog.Convert (convert)
-
Convert.hs
->src/HsBlog/Convert.hs
module HsBlog.Convert where import qualified HsBlog.Markup as Markup import qualified HsBlog.Html as Html
-
Html.hs
->src/HsBlog/Html.hs
module HsBlog.Html ... import HsBlog.Html.Internal
-
Html/Internal.hs
->src/HsBlog/Html/Internal.hs
module HsBlog.Html.Internal where
-
Markup.hs
->src/HsBlog/Markup.hs
module HsBlog.Markup
Executable
We have separated our code into two sections: a library and an executable, why?
First, libraries can be used by others. If we publish our code and someone wants to use it and build upon it, they can. Executables can't be imported by other projects. Second, we can write unit tests for libraries. It is usually benefitical to write most, if not all, of our logic as a library, and provide a thin executable over it.
Executables' descriptions are very similar to libraries, here we define:
- The name of the executable
- Where the source directory for this application is
- Which file is the 'Main' file
- Import our library, which is named
hs-blog
- Additional flags for GHC, e.g.,
-O
to compile with optimizations
executable hs-blog-gen
import: common-settings
hs-source-dirs: app
main-is: Main.hs
build-depends:
base
, hs-blog
ghc-options:
-O
We can write many executables descriptions. In this case we only have one.
Exercise: Add a new file: app/Main.hs
which imports HsBlog
and runs main
.
Solution
-- app/Main.hs
module Main where
import qualified HsBlog
main :: IO ()
main = HsBlog.main
Test-suites
test-suite
defines a target for running package tests. We will get back to it
in a later chapter.
Our complete .cabal file
cabal-version: 2.4
name: hs-blog
version: 0.1.0.0
synopsis: A custom blog generator from markup files
description: This package provides a static blog generator
from a custom markup format to HTML.
It defines a parser for this custom markup format
as well as an html pretty printer EDSL.
It is used as the example project in the online book
'Learn Haskell Blog Generator'. See the README for
more details.
homepage: https://github.com/soupi/learn-haskell-blog-generator
bug-reports: https://github.com/soupi/learn-haskell-blog-generator/issues
license: BSD-3-Clause
license-file: LICENSE.txt
author: Gil Mizrahi
maintainer: gilmi@posteo.net
category: Learning, Web
extra-doc-files:
README.md
common common-settings
default-language: Haskell2010
ghc-options:
-Wall
library
import: common-settings
hs-source-dirs: src
build-depends:
base
, directory
exposed-modules:
HsBlog
HsBlog.Convert
HsBlog.Html
HsBlog.Html.Internal
HsBlog.Markup
-- other-modules:
executable hs-blog-gen
import: common-settings
hs-source-dirs: app
main-is: Main.hs
build-depends:
base
, hs-blog
ghc-options:
-O
We'll also add a README.md
file and a LICENSE.txt
file:
README.md
Just write whatever you want here:
# hs-blog
One day it will be a static blog generator.
[Read the book](https://lhbg-book.link).
LICENSE.txt
This is BSD-3-Clause with me as the author. Please write your own name for your projects :)
BSD 3-Clause License
Copyright (c) 2021-2022, Gil Mizrahi
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
cabal.project
and stack.yaml
The cabal.project and
stack.yaml
files are used by cabal
and stack
respectively to add additional information on how
to build the package. While cabal.project
isn't necessary to use cabal
, stack.yaml
is necessary in order to use stack
, so we will cover it briefly.
There are two important fields a stack.yaml
file must have:
resolver
: Describes which snapshot to use for packages and ghc version. We will choose the latest (at time of writing) on thelts
branch:lts-18.22
. Visit this link to find out which packages this snapshot includes, what their versions are, and which GHC version is used with this snapshot.packages
: Describes the location of packages we plan to build. In our case we have only one and it can be found in the current directory.
We'll add stack.yaml
to our project directory:
resolver: lts-18.22
packages:
- .
For additional options and configurations, please consult the relevant user guides.
Usage
Now, instead of manually running runghc Main.hs
, we will use either stack
or cabal
to build and run our program and package (I mostly use stack, but it's up to you).
For cabal:
Building the project - on the first run, cabal will download the package dependencies and use the GHC on PATH to build the project.
Cabal caches packages between projects, so if a new project uses the same packages
with the same versions (and the same flag settings) they will not need to be reinstalled.
cabal
commands are usually prefixed with v2-
to note that we want to use the new
build system implementation.
In older version of cabal, packages could be installed either globally, or in sandboxes. In each sandbox (and globally) there could only be one version of a package installed, and users would usually create different sandboxes for different projects, without caching packages between projects.
With the new build system implementation, multiple versions of the same package can be installed globally, and for each project cabal will (try to) choose a specific version for each package dependency such that they all work together, without needing sandboxing. This change helps us increase sharing of built packages while avoiding conflicts and manual handling of sandboxes.
A few important commands we should be familiar with:
cabal v2-update
v2-update
fetches information from remote package repositories (specifically Hackage unless specified otherwise)
and updates the local package index which includes various information about available packages such as
their names, versions and dependencies.
cabal v2-update
is usually the first command to run before fetching package dependencies.
cabal v2-build
v2-build
compiles the various targets (such as library
and executable
s).
It will also fetch and install the package dependencies when they're not already installed.
cabal v2-run hs-blog-gen -- <program arguments>
v2-run
Can be used to compile and then run a target (in our case our executable
which we named hs-blog-gen
).
We separate arguments passed to cabal
and arguments passed to our target program with --
.
cabal v2-repl hs-blog
v2-repl
runs ghci
in the context of the target (in our case our library
which we named hs-blog
) -
it will load the target's package dependencies and modules to be available in ghci
.
cabal v2-clean
v2-clean
Deletes the build artifacts that we built.
There are more interesting commands we could use, such as cabal v2-freeze
to generate
a file which records the packages versions and flags we used to build this project,
and cabal v2-sdist
to bundle the project source to a package tarball which can be
uploaded to Hackage. If you'd like to learn more visit the
Cabal user guide.
For stack:
Building the project - on the first run, stack will install the right GHC for this project
which is specified by the resolver
field in the stack.yaml
file,
download the package dependencies, and compile the project.
Stack caches these installations between projects that use the same resolver, so future projects with the same resolver and future runs of this project won't require reinstallation. This approach is kind of a middle ground between full packages sharing and sandboxes.
Let's look at the (somewhat) equivalent commands for Stack:
stack build
build
will compile the project as described above - installing GHC and package dependencies if they are not
installed.
stack exec hs-blog-gen -- <program arguments>
exec
will run the executable passing the program arguments to our executable.
stack ghci hs-blog
ghci
runs ghci
in the context of our library hs-blog
- loading the library modules
and packages.
stack clean
clean
cleans up build artifacts.
The Stack user guide contains more information about how stack works and how to use it effectively.
Build artifacts
Both stack and cabal create build artifacts that we will not want to track using
our version control. These build artifacts are found in the dist
, dist-newstyle
and .stack-work
directories. We can add these to a .gitignore
file
(or similar for other version control programs) to ignore them:
dist
dist-newstyle
.stack-work
Finding packages
Finding packages isn't a very straightforward process at the moment. People have written on how they choose packages, recommendation lists, and more.
My suggestion is:
- Search for a tutorial on something you'd like to do, and see which packages come up
- Use the download amount on Hackage as an indication of package popularity
- Use Stackage package synopses to locate a relevant package
- Check social network channels for recommendations, but know that sometimes people tend to recommend inappropriate solutions and packages that might be too complicated or still experimental
It's also important to note the amount of dependencies a package has. Adding many dependencies will affect compilation time and code size. And it can sometimes be a good thing to consider when comparing packages, or considering whether a package is needed at all.
Summary
We've created a package description for our library and used stack
and/or cabal
to build our program. In future chapters we'll start adding external packages,
we'll only have to add them to the build-depends
section in the cabal file and
our package manager will download and install the required package for us!
We've made some change to our project directory, and it should now look like this:
.
├── app
│ └── Main.hs
├── hs-blog.cabal
├── LICENSE.txt
├── README.md
├── src
│ ├── HsBlog
│ │ ├── Convert.hs
│ │ ├── Html
│ │ │ └── Internal.hs
│ │ ├── Html.hs
│ │ └── Markup.hs
│ └── HsBlog.hs
└── stack.yaml
4 directories, 10 files
Note that this package format could be release on Hackage for other Haskell developers to use!
You can view the git commit of the changes we've made and the code up until now.
Fancy options parsing
We'd like to define a nicer interface for our program. While we could manage something
ourselves with getArgs
and pattern matching, it is easier to get good results using a library.
We are going to use a package called
optparse-applicative.
optparse-applicative
provides us with an EDSL (yes, another one) to build
command arguments parsers. Things like commands, switches, and flags can be built
and composed together to make a parser for command-line arguments without actually
writing operations on strings as we did when we wrote our Markup parser, and will
provide other benefits such as automatic generation of usage lines, help screens,
error reporting, and more.
While optparse-applicative
's dependency footprint isn't very large,
it is likely that a user of our library wouldn't need command-line parsing
in this particular case, so it makes sense to add this dependency to the executable
section
(rather than the library
section) in the .cabal
file:
executable hs-blog-gen
import: common-settings
hs-source-dirs: app
main-is: Main.hs
build-depends:
base
+ , optparse-applicative
, hs-blog
ghc-options:
-O
Building a command-line parser
The optparse-applicative package has pretty decent documentation, but we will cover a few important things to pay attention to in this chapter.
In general, there are four important things we need to do:
-
Define our model - we want to define an ADT that describes the various options and commands for our program.
-
Define a parser that will produce a value of our model type when run
-
Run the parser on our program arguments input
-
Pattern match on the model and call the right operations according to the options
Define a model
Let's envision our command-line interface for a second, what should it look like?
We want to be able to convert a single file or input stream to either a file or an output stream, or we want to process a whole directory and create a new directory. We can model it in an ADT like this:
data Options
= ConvertSingle SingleInput SingleOutput
| ConvertDir FilePath FilePath
deriving Show
data SingleInput
= Stdin
| InputFile FilePath
deriving Show
data SingleOutput
= Stdout
| OutputFile FilePath
deriving Show
Note that we could technically also use
Maybe FilePath
to encode bothSingleInput
andSingleOutput
, but then we would have to remember whatNothing
means in each context. By creating a new type with properly named constructors for each option we make it easier for readers of the code to understand the meaning of our code.
In terms of interface, we could decide that when a user would like to convert
a single input source, they would use the convert
command, and supply the optional flags
--input FILEPATH
and --output FILEPATH
to read or write from a file.
When the user does not supply one or both flag, we will read or write from
the standard input/output accordingly.
If the user would like to convert a directory, they can use the convert-dir
command and supply the two mandatory flags --input FILEPATH
and
--output FILEPATH
.
Build a parser
This is the most interesting part of the process. How do we build a parser that fits our model?
The optparse-applicative
library introduces a new type called Parser
.
Parser
, similar to Maybe
and IO
, has the kind * -> *
- when it
is supplied with a saturated (or concrete) type such as Int
, Bool
or
Options
, it can become a saturated type (one that has values).
A Parser a
represents a specification of a command-line options parser
that produces a value of type a
when the command-line arguments are
successfully parsed.
This is similar to how IO a
represents a description of a program
that can produce a value of type a
. The main difference between these
two types is that while we can't convert an IO a
to an a
(we just chain IO operations and have the Haskell runtime execute them),
we can convert a Parser a
to a function that takes a list of strings
representing the program arguments and produces an a
if it manages
to parse the arguments.
As we've seen with the previous EDSLs, this library uses the combinator pattern as well. We need to consider the basic primitives for building a parser, and the methods of composing small parsers into bigger parsers.
Let's see an example for a small parser:
inp :: Parser FilePath
inp =
strOption
( long "input"
<> short 'i'
<> metavar "FILE"
<> help "Input file"
)
out :: Parser FilePath
out =
strOption
( long "output"
<> short 'o'
<> metavar "FILE"
<> help "Output file"
)
strOption
is a parser builder. It is a function that takes a combined
option modifiers as an argument, and returns a parser that will parse a string.
We can specify the type to be FilePath
because FilePath
is an
alias to String
. The parser builder describes how to parse the value,
and the modifiers describe its properties, such as the flag name,
the shorthand of the flag name, and how it would be described in the usage
and help messages.
Actually
strOption
can return any string type that implements the interfaceIsString
. There are a few such types, for exampleText
, a much more efficient Unicode text type from thetext
package. It is more efficient thanString
because whileString
is implemented as a linked list ofChar
,Text
is implemented as an array of bytes.Text
is usually what we should use for text values instead ofString
. We haven't been using it up until now because it is slightly less ergonomic to use thanString
. But it is often the preferred type to use for text!
As you can see, modifiers can be composed using the <>
function,
which means modifiers implement an instance of the Semigroup
type class!
With such an interface we don't have to supply all the modifier options, but only the relevant ones. So if we don't want to have a shortened flag name, we don't have to add it.
Functor
For the data type we've defined, having Parser FilePath
takes us
a good step in the right direction, but it is not exactly what we need
for a ConvertSingle
. We need a Parser SingleInput
and a
Parser SingleOutput
. If we had a FilePath
, we could convert
it into SingleInput
by using the InputFile
constructor.
Remember, InputFile
is also a function:
InputFile :: FilePath -> SingleInput
OutputFile :: FilePath -> SingleOutput
However, to convert a parser, we need functions with these types:
f :: Parser FilePath -> Parser SingleInput
g :: Parser FilePath -> Parser SingleOutput
Fortunately, the Parser
interface provides us with a function to "lift"
a function like FilePath -> SingleInput
to work on parsers, making
it a function with the type Parser FilePath -> Parser SingleInput
.
Of course, this function will work for any input and output,
so if we have a function with the type a -> b
, we can pass it to
that function and get a new function of the type Parser a -> Parser b
.
This function is called fmap
:
fmap :: (a -> b) -> Parser a -> Parser b
-- Or with its infix version
(<$>) :: (a -> b) -> Parser a -> Parser b
We've seen fmap
before in the interface of other types:
fmap :: (a -> b) -> [a] -> [b]
fmap :: (a -> b) -> IO a -> IO b
fmap
is a type class function like <>
and show
. It belongs
to the type class Functor
:
class Functor f where
fmap :: (a -> b) -> f a -> f b
And it has the following laws:
-- 1. Identity law:
-- if we don't change the values, nothing should change
fmap id = id
-- 2. Composition law:
-- Composing the lifted functions is the same a composing
-- them after fmap
fmap (f . g) == fmap f . fmap g
Any type f
that can implement fmap
and follow these laws can be a valid
instance of Functor
.
Notice how
f
has a kind* -> *
, we can infer the kind off
by looking at the other types in the type signature offmap
:
a
andb
have the kind*
because they are used as arguments/return types of functionsf a
has the kind*
because it is used as an argument to a function, thereforef
has the kind* -> *
Let's choose a data type and see if we can implement a Functor
instance.
We need to choose a data type that has the kind * -> *
. Maybe
fits the bill.
We need to implement a function fmap :: (a -> b) -> Maybe a -> Maybe b
.
Here's one very simple (and wrong) implementation:
mapMaybe :: (a -> b) -> Maybe a -> Maybe b
mapMaybe func maybeX = Nothing
Check it yourself! It compiles successfully! But unfortunately it does not
satisfy the first law. fmap id = id
means that
mapMaybe id (Just x) == Just x
, however from the definition we can
clearly see that mapMaybe id (Just x) == Nothing
.
This is a good example of how Haskell doesn't help us make sure the laws
are satisfied, and why they are important. Unlawful Functor
instances
will behave differently from what we'd expect a Functor
to behave.
Let's try again!
mapMaybe :: (a -> b) -> Maybe a -> Maybe b
mapMaybe func maybeX =
case maybeX of
Nothing -> Nothing
Just x -> Just (func x)
This mapMaybe
will satisfy the functor laws. This can be proved
by doing algebra - if we can do substitution and reach the other side of the
equation in each law, then the law holds.
Functor is a very important type class, and many types implement this interface.
As we know, IO
, Maybe
, []
and Parser
all have the kind * -> *
,
and all allows us to map over their "payload" type.
Often people try to look for analogies and metaphors to what a type class mean, but type classes with funny names like
Functor
don't usually have an analogy or a metaphor that fits them in all cases. It is easier to give up on the metaphor and think about it as it is - an interface with laws.
We can use fmap
on Parser
to make a parser that returns FilePath
to
return a SingleInput
or SingleOutput
instead:
pInputFile :: Parser SingleInput
pInputFile = fmap InputFile parser
where
parser =
strOption
( long "input"
<> short 'i'
<> metavar "FILE"
<> help "Input file"
)
pOutputFile :: Parser SingleOutput
pOutputFile = OutputFile <$> parser -- fmap and <$> are the same
where
parser =
strOption
( long "output"
<> short 'o'
<> metavar "FILE"
<> help "Output file"
)
Applicative
Now that we have two parsers,
pInputFile :: Parser SingleInput
and pOutputFile :: Parser SingleOutput
,
we want to combine them as Options
. Again, if we only had
SingleInput
and SingleOutput
, we could use the constructor ConvertSingle
:
ConvertSingle :: SingleInput -> SingleOutput -> Options
Can we do a similar trick to the one we saw before with fmap
?
Does a function exist that can lift a binary function to work
on Parser
s instead? One with this type signature:
???
:: (SingleInput -> SingleOutput -> Options)
-> (Parser SingleInput -> Parser SingleOutput -> Parser Options)
Yes. This function is called liftA2
and it is from the Applicative
type class. Applicative
(also known as applicative functor) has three
primary functions:
class Functor f => Applicative f where
pure :: a -> f a
liftA2 :: (a -> b -> c) -> f a -> f b -> f c
(<*>) :: f (a -> b) -> f a -> f b
Applicative
is another very popular type class with many instances.
Just like any Monoid
is a Semigroup
, any Applicative
is a Functor
. This means that any type that wants to implement
the Applicative
interface should also implement the Functor
interface.
Beyond what a regular functor can do, which is to lift a function over
a certain f
, applicative functors allow us to apply a function to
multiple instances of a certain f
, as well as "lift" any value of type a
into an f a
.
You should already be familiar with pure
, we've seen it when we
talked about IO
. For IO
, pure
lets us create an IO
action
with a specific return value without doing IO.
With pure
for Parser
, we can create a Parser
that when run
will return a specific value as output without doing any parsing.
liftA2
and <*>
are two functions that can be implemented in
terms of one another. <*>
is actually the more useful one between
the two. Because when combined with fmap
(or rather the infix version <$>
),
it can be used to apply a function with many arguments, instead of just two.
To combine our two parsers to one, we can use either liftA2
or
a combination of <$>
and <*>
:
-- with liftA2
pConvertSingle :: Parser Options
pConvertSingle =
liftA2 ConvertSingle pInputFile pOutputFile
-- with <$> and <*>
pConvertSingle :: Parser Options
pConvertSingle =
ConvertSingle <$> pInputFile <*> pOutputFile
Note that both <$>
and <*>
associate to the left,
so we have invisible parenthesis that look like this:
pConvertSingle :: Parser Options
pConvertSingle =
(ConvertSingle <$> pInputFile) <*> pOutputFile
Let's take a deeper look at the types of the sub-expressions we have here, to prove that this type-checks:
pConvertSingle :: Parser Options
pInputFile :: Parser SingleInput
pOutputFile :: Parser SingleOutput
ConvertSingle :: SingleInput -> SingleOutput -> Options
(<$>) :: (a -> b) -> Parser a -> Parser b
-- Specifically, here `a` is `SingleInput`
-- and `b` is `SingleOutput -> Options`,
ConvertSingle <$> pInputFile :: Parser (SingleOutput -> Options)
(<*>) :: Parser (a -> b) -> Parser a -> Parser b
-- Specifically, here `a -> b` is `SingleOutput -> Options`
-- so `a` is `SingleOutput` and `b` is `Options`
-- So we get:
(ConvertSingle <$> pInputFile) <*> pOutputFile :: Parser Options
With <$>
and <*>
we can chain as many parsers (or any applicative really)
as we want. This is because of two things: currying and parametric polymorphism.
Because functions in Haskell take exactly one argument and return exactly one,
any multiple argument function can be represented as a -> b
.
You can find the laws for the applicative functors in this article called Typeclassopedia, which talks about various useful type classes and their laws.
Applicative functor is a very important concept and will appear in various
parser interfaces (not just for command-line arguments, but also JSON
parsers and general parsers), I/O, concurrency, non-determinism, and more.
The reason this library is called optparse-applicative is because
it uses the Applicative
interface as the main API for
constructing parsers.
Exercise: create a similar interface for the ConvertDir
constructor of Options
.
Solution
pInputDir :: Parser FilePath
pInputDir =
strOption
( long "input"
<> short 'i'
<> metavar "DIRECTORY"
<> help "Input directory"
)
pOutputDir :: Parser FilePath
pOutputDir =
strOption
( long "output"
<> short 'o'
<> metavar "DIRECTORY"
<> help "Output directory"
)
pConvertDir :: Parser Options
pConvertDir =
ConvertDir <$> pInputDir <*> pOutputDir
Alternative
One thing we forgot about is that each input and output for
ConvertSingle
could also potentially use the standard input and output instead.
Up until now we only offered one option: reading from or writing to a file
by specifying the flags --input
and --output
.
However, we'd like to make these flags optional, and when they are
not specified, use the alternative standard i/o. We can do that by using
the function optional
from Control.Applicative
:
optional :: Alternative f => f a -> f (Maybe a)
optional
works on types which implement instances of the
Alternative
type class:
class Applicative f => Alternative f where
(<|>) :: f a -> f a -> f a
empty :: f a
Alternative
looks very similar to the Monoid
type class,
but it works on applicative functors. This type class isn't
very common and is mostly used for parsing libraries as far as I know.
It provides us with an interface to combine two Parser
s -
if the first one fails to parse, try the other.
It also provides other useful functions such as optional
,
which will help us with our case:
pSingleInput :: Parser SingleInput
pSingleInput =
fromMaybe Stdin <$> optional pInputFile
pSingleOutput :: Parser SingleOutput
pSingleOutput =
fromMaybe Stdout <$> optional pOutputFile
Note that with fromMaybe :: a -> Maybe a -> a
we can extract
the a
out of the Maybe
by supplying a value for the Nothing
case.
Now we can use these more appropriate functions in pConvertSingle
instead:
pConvertSingle :: Parser Options
pConvertSingle =
ConvertSingle <$> pSingleInput <*> pSingleOutput
Commands and subparsers
We currently have two possible operations in our interface,
convert a single source, or convert a directory. A nice interface for
selecting the right operation would be via commands.
If the user would like to convert a single source, they can use
convert
, for a directory, convert-dir
.
We can create a a parser with commands with the subparser
and command
functions:
subparser :: Mod CommandFields a -> Parser a
command :: String -> ParserInfo a -> Mod CommandFields a
subparser
takes command modifiers (which can be constructed
with the command
function) as input, and produces a Parser
.
command
takes the command name (in our case "convert" or "convert-dir")
and a ParserInfo a
, and produces a command modifier. As we've seen
before these modifiers have a Monoid
instance and they can be
composed, meaning that we can append multiple commands to serve as alternatives.
A ParserInfo a
can be constructed with the info
function:
info :: Parser a -> InfoMod a -> ParserInfo a
This function wraps a Parser
with some additional information
such as a helper message, description, and more, so that the program
itself and each sub command can print some additional information.
Let's see how to construct a ParserInfo
:
pConvertSingleInfo :: ParserInfo Options
pConvertSingleInfo =
info
(helper <*> pConvertSingle)
(progDesc "Convert a single markup source to html")
Note that helper
adds a helper output screen in case the parser fails.
Let's also build a command:
pConvertSingleCommand :: Mod CommandFields Options
pConvertSingleCommand =
command "convert" pConvertSingleInfo
Try creating a Parser Options
combining the two options with subparser
.
Solution
pOptions :: Parser Options
pOptions =
subparser
( command
"convert"
( info
(helper <*> pConvertSingle)
(progDesc "Convert a single markup source to html")
)
<> command
"convert-dir"
( info
(helper <*> pConvertDir)
(progDesc "Convert a directory of markup files to html")
)
)
ParserInfo
Since we finished building a parser, we should wrap it up in a ParserInfo
and add some information to it to make it ready to run.
opts :: ParserInfo Options
opts =
info (helper <*> pOptions)
( fullDesc
<> header "hs-blog-gen - a static blog generator"
<> progDesc "Convert markup files or directories to html"
)
Running a parser
optparse-applicative
provides a non-IO
interface to parse arguments,
but the most convenient way to use it is to let it take care of fetching
program arguments, try to parse them, and throw errors and help messages in case
it fails. This can be done with the function execParser :: ParserInfo a -> IO a
.
We can place all this options parsing stuff in a new module
and then import it from app/Main.hs
. Let's do that.
Here's what we have up until now:
app/OptParse.hs
-- | Command-line options parsing
module OptParse
( Options(..)
, SingleInput(..)
, SingleOutput(..)
, parse
)
where
import Data.Maybe (fromMaybe)
import Options.Applicative
------------------------------------------------
-- * Our command-line options model
-- | Model
data Options
= ConvertSingle SingleInput SingleOutput
| ConvertDir FilePath FilePath
deriving Show
-- | A single input source
data SingleInput
= Stdin
| InputFile FilePath
deriving Show
-- | A single output sink
data SingleOutput
= Stdout
| OutputFile FilePath
deriving Show
------------------------------------------------
-- * Parser
-- | Parse command-line options
parse :: IO Options
parse = execParser opts
opts :: ParserInfo Options
opts =
info (pOptions <**> helper)
( fullDesc
<> header "hs-blog-gen - a static blog generator"
<> progDesc "Convert markup files or directories to html"
)
-- | Parser for all options
pOptions :: Parser Options
pOptions =
subparser
( command
"convert"
( info
(helper <*> pConvertSingle)
(progDesc "Convert a single markup source to html")
)
<> command
"convert-dir"
( info
(helper <*> pConvertDir)
(progDesc "Convert a directory of markup files to html")
)
)
------------------------------------------------
-- * Single source to sink conversion parser
-- | Parser for single source to sink option
pConvertSingle :: Parser Options
pConvertSingle =
ConvertSingle <$> pSingleInput <*> pSingleOutput
-- | Parser for single input source
pSingleInput :: Parser SingleInput
pSingleInput =
fromMaybe Stdin <$> optional pInputFile
-- | Parser for single output sink
pSingleOutput :: Parser SingleOutput
pSingleOutput =
fromMaybe Stdout <$> optional pOutputFile
-- | Input file parser
pInputFile :: Parser SingleInput
pInputFile = fmap InputFile parser
where
parser =
strOption
( long "input"
<> short 'i'
<> metavar "FILE"
<> help "Input file"
)
-- | Output file parser
pOutputFile :: Parser SingleOutput
pOutputFile = OutputFile <$> parser
where
parser =
strOption
( long "output"
<> short 'o'
<> metavar "FILE"
<> help "Output file"
)
------------------------------------------------
-- * Directory conversion parser
pConvertDir :: Parser Options
pConvertDir =
ConvertDir <$> pInputDir <*> pOutputDir
-- | Parser for input directory
pInputDir :: Parser FilePath
pInputDir =
strOption
( long "input"
<> short 'i'
<> metavar "DIRECTORY"
<> help "Input directory"
)
-- | Parser for output directory
pOutputDir :: Parser FilePath
pOutputDir =
strOption
( long "output"
<> short 'o'
<> metavar "DIRECTORY"
<> help "Output directory"
)
Pattern matching on Options
After running the command-line arguments parser, we can pattern match
on our model and call the right functions. Currently, our program
does not expose this kind of API. So let's go to our src/HsBlog.hs
module and change the API. We can delete main
from that file and
add two new functions instead:
convertSingle :: Html.Title -> Handle -> Handle -> IO ()
convertDirectory :: FilePath -> FilePath -> IO ()
Handle
is an I/O abstraction over file system objects, including stdin
and stdout
.
Before, we used writeFile
and getContents
- these functions either
get a FilePath
to open and work on, or they assume the Handle
is the standard I/O.
We can use the explicit versions that take a Handle
from System.IO
instead:
convertSingle :: Html.Title -> Handle -> Handle -> IO ()
convertSingle title input output = do
content <- hGetContents input
hPutStrLn output (process title content)
We will leave convertDirectory
unimplemented for now and implement it in the next chapter.
In app/Main.hs
, we will need to pattern match on the Options
and
prepare to call the right functions from HsBlog
.
Let's look at our full app/Main.hs
and src/HsBlog.hs
:
app/Main.hs
-- | Entry point for the hs-blog-gen program
module Main where
import OptParse
import qualified HsBlog
import System.Exit (exitFailure)
import System.Directory (doesFileExist)
import System.IO
main :: IO ()
main = do
options <- parse
case options of
ConvertDir input output ->
HsBlog.convertDirectory input output
ConvertSingle input output -> do
(title, inputHandle) <-
case input of
Stdin ->
pure ("", stdin)
InputFile file ->
(,) file <$> openFile file ReadMode
outputHandle <-
case output of
Stdout -> pure stdout
OutputFile file -> do
exists <- doesFileExist file
shouldOpenFile <-
if exists
then confirm
else pure True
if shouldOpenFile
then
openFile file WriteMode
else
exitFailure
HsBlog.convertSingle title inputHandle outputHandle
hClose inputHandle
hClose outputHandle
------------------------------------------------
-- * Utilities
-- | Confirm user action
confirm :: IO Bool
confirm =
putStrLn "Are you sure? (y/n)" *>
getLine >>= \answer ->
case answer of
"y" -> pure True
"n" -> pure False
_ -> putStrLn "Invalid response. use y or n" *>
confirm
src/HsBlog.hs
-- HsBlog.hs
module HsBlog
( convertSingle
, convertDirectory
, process
)
where
import qualified HsBlog.Markup as Markup
import qualified HsBlog.Html as Html
import HsBlog.Convert (convert)
import System.IO
convertSingle :: Html.Title -> Handle -> Handle -> IO ()
convertSingle title input output = do
content <- hGetContents input
hPutStrLn output (process title content)
convertDirectory :: FilePath -> FilePath -> IO ()
convertDirectory = error "Not implemented"
process :: Html.Title -> String -> String
process title = Html.render . convert title . Markup.parse
We need to make a few small changes to the .cabal
file.
First, we need to add the dependency directory
to the executable
,
because we use the library System.Directory
in Main
.
Second, we need to list OptParse
in the list of modules in
the executable
.
executable hs-blog-gen
import: common-settings
hs-source-dirs: app
main-is: Main.hs
+ other-modules:
+ OptParse
build-depends:
base
+ , directory
, optparse-applicative
, hs-blog
ghc-options:
-O
Summary
We've learned about a new fancy library called optparse-applicative
and used it to create a fancier command-line interface in a declarative way.
See the result of running hs-blog-gen --help
:
hs-blog-gen - a static blog generator
Usage: hs-blog-gen COMMAND
Convert markup files or directories to html
Available options:
-h,--help Show this help text
Available commands:
convert Convert a single markup source to html
convert-dir Convert a directory of markup files to html
Along the way we've learned two powerful new abstractions, Functor
and Applicative
, as well as revisited an abstraction
called Monoid
. With this library we've seen another example
of the usefulness of these abstractions for constructing APIs and EDSLs.
We will continue to meet these abstractions in the rest of the book.
Bonus exercise: Add another flag named --replace
to indicate that
if the output file or directory already exists, it's okay to replace them.
You can view the git commit of the changes we've made and the code up until now.
Handling errors and multiple files
We have left an unimplemented function last chapter, and there are a few more things left for us to do to actually call our program a static blog generator. We still need to process multiple files in a directory and create an index landing page with links to other pages.
Links in HTML
Our HTML EDSL currently does not support links or other content modifiers such as bold and italics. We should add these so we can use them when creating an index.
Up until now we've passed String
to Structure
creating functions such as p_
and h_
. Instead, we could create and pass them a new type, Content
, which
can be regular text, links, images, and so on.
Exercise: implement what we've just discussed. Follow the compiler errors and refactor what needs refactoring.
Solution
src/Html/Internal.hs
module HsBlog.Html.Internal where
import Numeric.Natural
-- * Types
newtype Html
= Html String
newtype Structure
= Structure String
newtype Content
= Content String
type Title
= String
-- * EDSL
html_ :: Title -> Structure -> Html
html_ title content =
Html
( el "html"
( el "head" (el "title" (escape title))
<> el "body" (getStructureString content)
)
)
-- * Structure
p_ :: Content -> Structure
p_ = Structure . el "p" . getContentString
h_ :: Natural -> Content -> Structure
h_ n = Structure . el ("h" <> show n) . getContentString
ul_ :: [Structure] -> Structure
ul_ =
Structure . el "ul" . concat . map (el "li" . getStructureString)
ol_ :: [Structure] -> Structure
ol_ =
Structure . el "ol" . concat . map (el "li" . getStructureString)
code_ :: String -> Structure
code_ = Structure . el "pre" . escape
instance Semigroup Structure where
(<>) c1 c2 =
Structure (getStructureString c1 <> getStructureString c2)
instance Monoid Structure where
mempty = Structure ""
-- * Content
txt_ :: String -> Content
txt_ = Content . escape
link_ :: FilePath -> Content -> Content
link_ path content =
Content $
elAttr
"a"
("href=\"" <> escape path <> "\"")
(getContentString content)
img_ :: FilePath -> Content
img_ path =
Content $ "<img src=\"" <> escape path <> "\">"
b_ :: Content -> Content
b_ content =
Content $ el "b" (getContentString content)
i_ :: Content -> Content
i_ content =
Content $ el "i" (getContentString content)
instance Semigroup Content where
(<>) c1 c2 =
Content (getContentString c1 <> getContentString c2)
instance Monoid Content where
mempty = Content ""
-- * Render
render :: Html -> String
render html =
case html of
Html str -> str
-- * Utilities
el :: String -> String -> String
el tag content =
"<" <> tag <> ">" <> content <> "</" <> tag <> ">"
elAttr :: String -> String -> String -> String
elAttr tag attrs content =
"<" <> tag <> " " <> attrs <> ">" <> content <> "</" <> tag <> ">"
getStructureString :: Structure -> String
getStructureString structure =
case structure of
Structure str -> str
getContentString :: Content -> String
getContentString content =
case content of
Content str -> str
escape :: String -> String
escape =
let
escapeChar c =
case c of
'<' -> "<"
'>' -> ">"
'&' -> "&"
'"' -> """
'\'' -> "'"
_ -> [c]
in
concat . map escapeChar
src/Html.hs
module HsBlog.Html
( Html
, Title
, Structure
, html_
, p_
, h_
, ul_
, ol_
, code_
, Content
, txt_
, img_
, link_
, b_
, i_
, render
)
where
import HsBlog.Html.Internal
src/Convert.hs
module HsBlog.Convert where
import qualified HsBlog.Markup as Markup
import qualified HsBlog.Html as Html
convert :: Html.Title -> Markup.Document -> Html.Html
convert title = Html.html_ title . foldMap convertStructure
convertStructure :: Markup.Structure -> Html.Structure
convertStructure structure =
case structure of
Markup.Heading n txt ->
Html.h_ n $ Html.txt_ txt
Markup.Paragraph p ->
Html.p_ $ Html.txt_ p
Markup.UnorderedList list ->
Html.ul_ $ map (Html.p_ . Html.txt_) list
Markup.OrderedList list ->
Html.ol_ $ map (Html.p_ . Html.txt_) list
Markup.CodeBlock list ->
Html.code_ (unlines list)
You can view the git commit of the changes we've made and the code up until now.
Creating an index page
With our extended HTML EDSL, we can now create an index page with links to the other pages.
To create an index page, we need a list of files with their target destinations,
as well as their Markup
(so we can extract information to include in our index page,
such as the first heading and paragraph). Our output should be an Html
page.
We need to implement the following function:
buildIndex :: [(FilePath, Markup)] -> Html
Solution
buildIndex :: [(FilePath, Markup.Document)] -> Html.Html
buildIndex files =
let
previews =
map
( \(file, doc) ->
case doc of
Markup.Heading 1 heading : article ->
Html.h_ 3 (Html.link_ file (Html.txt_ heading))
<> foldMap convertStructure (take 3 article)
<> Html.p_ (Html.link_ file (Html.txt_ "..."))
_ ->
Html.h_ 3 (Html.link_ file (Html.txt_ file))
)
files
in
Html.html_
"Blog"
( Html.h_ 1 (Html.link_ "index.html" (Html.txt_ "Blog"))
<> Html.h_ 2 (Html.txt_ "Posts")
<> mconcat previews
)
Processing directories
Our general strategy for processing whole directories is going to be:
- Create the output directory
- Grab all file names in a directory
- Filter them according to their extension, we want to process
txt
file and copy other files without modification - We want to parse each text file, build an index of the result, convert the files to HTML, and write everything to the target directory
While our parsing function can't really fail, trying to read or write a file to the file-system can fail in several ways. It would be nice if our static blog generator was robust enough that it wouldn't fail completely if one single file gave it some trouble. This is a good opportunity to learn about error handling in Haskell, both in uneffectful code and for I/O code.
In the next few chapters we'll survey the landscape of error handling in Haskell before figuring out the right approach for our use case.
Handling errors with Either
There are quite a few ways to indicate and handle errors in Haskell. We are going to look at one solution: using the type Either. Either is defined like this:
data Either a b
= Left a
| Right b
Simply put, a value of type Either a b
can contain either a value of type a
,
or a value of type b
.
We can tell them apart from the constructor used.
Left True :: Either Bool b
Right 'a' :: Either a Char
With this type, we can use the
Left
constructor to indicate failure with some error value attached,
and the Right
constructor with one type to represent success with the
expected result.
Since Either
is polymorphic, we can use any two types to represent
failure and success. It is often useful to describe the failure modes
using an ADT.
For example, let's say that we want to parse a Char
as a decimal digit
to an Int
. This operation could fail if the Character is not a digit.
We can represent this error as a data type:
data ParseDigitError
= NotADigit Char
deriving Show
And our parsing function can have the type:
parseDigit :: Char -> Either ParseDigitError Int
Now when we implement our parsing function we can return Left
on an error
describing the problem, and Right
with the parsed value on successful parsing:
parseDigit :: Char -> Either ParseDigitError Int
parseDigit c =
case c of
'0' -> Right 0
'1' -> Right 1
'2' -> Right 2
'3' -> Right 3
'4' -> Right 4
'5' -> Right 5
'6' -> Right 6
'7' -> Right 7
'8' -> Right 8
'9' -> Right 9
_ -> Left (NotADigit c)
Either a
is also an instance of Functor
and Applicative
,
so we have some combinators to work with if we want to combine these
kind of computations.
For example, if we had three characters and we wanted to try and parse each of them and then find the maximum between them, we could use the applicative interface:
max3chars :: Char -> Char -> Char -> Either ParseDigitError Int
max3chars x y z =
(\a b c -> max a (max b c))
<$> parseDigit x
<*> parseDigit y
<*> parseDigit z
The Functor
and Applicative
interfaces of Either a
allow us to
apply functions to the payload values and delay the error handling to a
later phase. Semantically, the first Either in order that returns a Left
will be the return value. We can see how this works in the implementation
of the applicative instance:
instance Applicative (Either e) where
pure = Right
Left e <*> _ = Left e
Right f <*> r = fmap f r
At some point, someone will actually want to inspect the result
and see if we get an error (with the Left
constructor) or the expected value
(with the Right
constructor) and they can do that by pattern matching the result.
Applicative + Traversable
The Applicative
interface of Either
is very powerful, and can be combined
with another abstraction called
Traversable
-
for data structures that can be traversed from left to right, like a linked list or a binary tree.
With these, we can combine an unspecified amount of values such as Either ParseDigitError Int
,
as long as they are all in a data structure that implements Traversable
.
Let's see an example:
ghci> :t "1234567"
"1234567" :: String
-- remember, a String is an alias for a list of Char
ghci> :info String
type String :: *
type String = [Char]
-- Defined in ‘GHC.Base’
ghci> :t map parseDigit "1234567"
map parseDigit mystring :: [Either ParseDigitError Int]
ghci> map parseDigit "1234567"
[Right 1,Right 2,Right 3,Right 4,Right 5,Right 6,Right 7]
ghci> :t sequenceA
sequenceA :: (Traversable t, Applicative f) => t (f a) -> f (t a)
-- Substitute `t` with `[]`, and `f` with `Either Error` for a specialized version
ghci> sequenceA (map parseDigit mystring)
Right [1,2,3,4,5,6,7]
ghci> map parseDigit "1a2"
[Right 1,Left (NotADigit 'a'),Right 2]
ghci> sequenceA (map parseDigit "1a2")
Left (NotADigit 'a')
The pattern of doing map
and then sequenceA
is another function called traverse
:
ghci> :t traverse
traverse
:: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
ghci> traverse parseDigit "1234567"
Right [1,2,3,4,5,6,7]
ghci> traverse parseDigit "1a2"
Left (NotADigit 'a')
We can use traverse
on any two types where one implements the Applicative
interface, like Either a
or IO
, and the other implements the Traversable
interface,
like []
(linked lists) and
Map k
(also known as a dictionary in other languages - a mapping from keys to values).
For example using IO
and Map
. Note that we can construct a Map
data structure
from a list of tuples using the
fromList
function - the first value in the tuple is the key, and the second is the type.
ghci> import qualified Data.Map as M -- from the containers package
ghci> file1 = ("output/file1.html", "input/file1.txt")
ghci> file2 = ("output/file2.html", "input/file2.txt")
ghci> file3 = ("output/file3.html", "input/file3.txt")
ghci> files = M.fromList [file1, file2, file3]
ghci> :t files :: M.Map FilePath FilePath -- FilePath is an alias of String
files :: M.Map FilePath FilePath :: M.Map FilePath FilePath
ghci> readFiles = traverse readFile
ghci> :t readFiles
readFiles :: Traversable t => t FilePath -> IO (t String)
ghci> readFiles files
fromList [("output/file1.html","I'm the content of file1.txt\n"),("output/file2.html","I'm the content of file2.txt\n"),("output/file3.html","I'm the content of file3.txt\n")]
ghci> :t readFiles files
readFiles files :: IO (Map String String)
Above, we created a function readFiles
that will take a mapping from output file path
to input file path and returns an IO operation that when run will read the input files
and replace their contents right there in the map! Surely this will be useful later.
Multiple errors
Note, since Either
has the kind * -> * -> *
(it takes two type
parameters) Either
cannot be an instance of Functor
or Applicative
:
instances of these type classes must have the
kind * -> *
.
Remember that when we look at a type class function signature like:
fmap :: Functor f => (a -> b) -> f a -> f b
And we want to implement it for a specific type (in place of the f
),
we need to be able to substitute the f
with the target type. If we'd try
to do it with Either
we would get:
fmap :: (a -> b) -> Either a -> Either b
And neither Either a
or Either b
are saturated, so this won't type check.
For the same reason if we'll try to substitute f
with, say, Int
, we'll get:
fmap :: (a -> b) -> Int a -> Int b
Which also doesn't make sense.
While we can't use Either
, we can use Either e
, which has the kind
* -> *
. Now let's try substituting f
with Either e
in this signature:
liftA2 :: Applicative => (a -> b -> c) -> f a -> f b -> f c
And we'll get:
liftA2 :: (a -> b -> c) -> Either e a -> Either e b -> Either e c
What this teaches us is that we can only use the applicative interface to
combine two Either
s with the same type for the Left
constructor.
So what can we do if we have two functions that can return different errors? There are a few approaches, the most prominent ones are:
- Make them return the same error type. Write an ADT that holds all possible
error descriptions. This can work in some cases but isn't always ideal.
For example a user calling
parseDigit
shouldn't be forced to handle a possible case that the input might be an empty string. - Use a specialized error type for each type, and when they are composed together,
map the error type of each function to a more general error type. This can
be done with the function
first
from theBifunctor
type class.
Monadic interface
The applicative interface allows us to lift a function to work on multiple
Either
values (or other applicative functor instances such as IO
and Parser
).
But more often than not, we'd like to use a value from one computation
that might return an error in another computation that might return an error.
For example, a compiler such has GHC operates in stages, such as lexical analysis, parsing, type-checking, and so on. Each stage depends on the output of the stage before it, and each stage might fail. We can write the types for these functions:
tokenize :: String -> Either Error [Token]
parse :: [Token] -> Either Error AST
typecheck :: AST -> Either Error TypedAST
We want to compose these functions so that they work in a chain. The output of tokenize
goes to parse
, and the output of parse
goes to typecheck
.
We know that we can lift a function over an Either
(and other functors),
we can also lift a function that returns an Either
:
-- reminder the type of fmap
fmap :: Functor f => (a -> b) -> f a -> f b
-- specialized for `Either Error`
fmap :: (a -> b) -> Either Error a -> Either Error b
-- here, `a` is [Token] and `b` is `Either Error AST`:
> fmap parse (tokenize string) :: Either Error (Either Error AST)
While this code compiles, it isn't great, because we are building
layers of Either Error
and we can't use this trick again with
typecheck
! typecheck
expects an AST
, but if we try to fmap it
on fmap parse (tokenize string)
, the a
will be Either Error AST
instead.
What we would really like is to flatten this structure instead of nesting it.
If we look at the kind of values Either Error (Either Error AST)
could have,
it looks something like this:
Left <error>
Right (Left error)
Right (Right <ast>)
Exercise: What if we just used pattern matching for this instead? How would this look like?
Solution
case tokenize string of
Left err ->
Left err
Right tokens ->
case parse tokens of
Left err ->
Left err
Right ast ->
typecheck ast
If we run into an error in a stage, we return that error and stop. If we succeed, we use the value on the next stage.
Flattening this structure for Either
is very similar to that last part - the body
of the Right tokens
case:
flatten :: Either e (Either e a) -> Either e a
flatten e =
case e of
Left l -> Left l
Right x -> x
Because we have this function, we can now use it on the output of
fmap parse (tokenize string) :: Either Error (Either Error AST)
from before:
> flatten (fmap parse (tokenize string)) :: Either Error AST
And now we can use this function again to compose with typecheck
:
> flatten (fmap typecheck (flatten (fmap parse (tokenize string)))) :: Either Error TypedAST
This flatten
+ fmap
combination looks like a recurring pattern which
we can combine into a function:
flatMap :: (a -> Either e b) -> Either a -> Either b
flatMap func val = flatten (fmap func val)
And now we can write the code this way:
> flatMap typecheck (flatMap parse (tokenize string)) :: Either Error TypedAST
-- Or using backticks syntax to convert the function to infix form:
> typecheck `flatMap` parse `flatMap` tokenize string
-- Or create a custom infix operator: (=<<) = flatMap
> typeCheck =<< parse =<< tokenize string
This function, flatten
(and flatMap
as well), have different names in Haskell.
They are called
join
and =<<
(pronounced "reverse bind"),
and they are the essence of another incredibly useful abstraction in Haskell.
If we have a type that can implement:
- The
Functor
interface, specifically thefmap
function - The
Applicative
interface, most importantly thepure
function - This
join
function
They can implement an instance of the Monad
type class.
With functors, we were able to "lift" a function to work over the type implementing the functor type class:
fmap :: (a -> b) -> f a -> f b
With applicative functors we were able to "lift" a function of multiple arguments over multiple values of a type implementing the applicative functor type class, and also lift a value into that type:
pure :: a -> f a
liftA2 :: (a -> b -> c) -> f a -> f b -> f c
With monads we can now flatten (or, "join" in Haskell terminology) types that implement
the Monad
interface:
join :: m (m a) -> m a
-- this is =<< with the arguments reversed, pronounced "bind"
(>>=) :: m a -> (a -> m b) -> m b
With >>=
we can write our compilation pipeline from before in a left-to-right
manner, which seems to be more popular for monads:
> tokenize string >>= parse >>= typecheck
We have already met this function before when we talked about IO
. Yes,
IO
also implements the Monad
interface. The monadic interface for IO
helped us with creating a proper ordering of effects.
The essence of the Monad
interface is the join
/>>=
functions, and as we've seen
we can implement >>=
in terms of join
, we can also implement join
in terms
of >>=
(try it!).
The monadic interface can mean very different things for different types. For IO
this
is ordering of effects, for Either
it is early cutoff,
for Logic
this means backtracking computation, etc.
Again, don't worry about analogies and metaphors, focus on the API and the laws.
Hey, did you check the monad laws? left identity, right identity and associativity? We've already discussed a type class with exactly these laws - the
Monoid
type class. Maybe this is related to the famous quote about monads being just monoids in something something...
Do notation?
Remember the do notation? Turns out it works for any type that is
an instance of Monad
. How cool is that? Instead of writing:
pipeline :: String -> Either Error TypedAST
pipeline string =
tokenize string >>= \tokens ->
parse tokens >>= \ast ->
typecheck ast
We can write:
pipeline :: String -> Either Error TypedAST
pipeline string = do
tokens <- tokenize string
ast <- parse tokens
typecheck ast
And it will work! Still, in this particular case tokenize string >>= parse >>= typecheck
is so concise it can only be beaten by using
>=>
>=> :: Monad m => (a -> m b) -> (b -> m c) -> a -> m c
-- compare with function composition:
(.) :: (a -> b) -> (b -> c) -> a -> c
pipeline = tokenize >=> parse >=> typecheck
Hakell's ability to create very concise code using abstractions is great once one is familiar with the abstractions. Knowing the monad abstraction, we are now already familiar with the core composition API of many libraries - for example:
- Concurrent and asynchronous programming
- Web programming
- Testing
- Emulating stateful computation
- sharing environment between computations
- and many more.
Summary
Using Either
for error handling is useful for two reasons:
- We encode possible errors using types, and we force users to acknowledge and handle them, thus making our code more resilient to crashes and bad behaviours.
- The
Functor
,Applicative
, andMonad
interfaces provide us with mechanisms for composing functions that might fail (almost) effortlessly - reducing boilerplate while maintaining strong guarantees about our code, and delaying the need to handle errors until it is appropriate.
Either with IO?
When we create IO
actions that may require I/O we risk running into all kinds of errors.
For example, when we use writeFile
, we could run out of disk space in the middle of writing,
or the file might be write protected. While these scenarios aren't super common, they are definitely
possible.
We could've potentially encoded Haskell functions like readFile
and writeFile
as IO
operations
that return Either
, for example:
readFile :: FilePath -> IO (Either ReadFileError String)
writeFile :: FilePath -> String -> IO (Either WriteFileError ())
However there are a couple of issues here, the first is that composing IO
actions
becomes more difficult. Previously we could write:
readFile "input.txt" >>= writeFile "output.html"
But now the types no longer match - readFile
will return an Either ReadFileError String
when executed,
but writeFile
wants to take a String
as input. We are forced to handle the error
before calling writeFile
.
Composing IO + Either using ExceptT
One way to handle this is by using monad transformers. Monad transformers provide a way to stack monad capabilities on top of one another. They are called transformers because they take a type that has an instance of monad as input, and return a new type that implements the monad interface, stacking a new capability on top of it.
For example, if we want to compose values of a type that is equivalent to IO (Either Error a)
,
using the monadic interface (the function >>=
), we can use a monad transformer
called ExceptT
and stack it over IO
.
Let's see how ExceptT
is defined:
newtype ExceptT e m a = ExceptT (m (Either e a))
Remember, a newtype
is a new name for an existing type. And if we substitute
e
with Error
and m
with IO
we get exactly IO (Either Error a)
as we wanted.
And we can convert an ExceptT Error IO a
into IO (Either Error a)
using
the function runExceptT
:
runExceptT :: ExceptT e m a -> m (Either e a)
ExceptT
implements the monadic interface in a way that combines the capabilities of
Either
, and whatever m
it takes. Because ExceptT e m
has a Monad
instance,
a specialized version of >>=
would look like this:
-- Generalized version
(>>=) :: Monad m => m a -> (a -> m b) -> m b
-- Specialized version, replace `m` with `ExceptT e m`
(>>=) :: ExceptT e m a -> (a -> ExceptT e m b) -> ExceptT e m b
Unsure how this works? Try to implement >>=
for IO (Either Error a)
:
bindExceptT :: IO (Either Error a) -> (a -> IO (Either Error b)) -> IO (Either Error b)
Solution
bindExceptT :: IO (Either Error a) -> (a -> IO (Either Error b)) -> IO (Either Error b)
bindExceptT mx f = do
x <- mx -- `x` has the type `Either Error a`
case x of
Left err -> pure (Left err)
Right y -> f y
Note that we didn't actually use the implementation details of Error
or IO
,
Error
isn't mentioned at all, and for IO
we only used the monadic interface with
the do notation. We could write the same function with a more generalized type signature:
bindExceptT :: Monad m => m (Either e a) -> (a -> m (Either e b)) -> m (Either e b)
bindExceptT mx f = do
x <- mx -- `x` has the type `Either e a`
case x of
Left err -> pure (Left err)
Right y -> f y
And because newtype ExceptT e m a = ExceptT (m (Either e a))
we can just
pack and unpack that ExceptT
constructor and get:
bindExceptT :: Monad m => ExceptT e m a -> (a -> ExceptT e m b) -> ExceptT e m b
bindExceptT mx f = ExceptT $ do
-- `runExceptT mx` has the type `m (Either e a)`
-- `x` has the type `Either e a`
x <- runExceptT mx
case x of
Left err -> pure (Left err)
Right y -> f y
Note that when stacking monad transformers, the order in which we stack them matters. With
ExceptT Error IO a
, we have anIO
operation that when run will returnEither
an error or a value.
ExceptT
can enjoy both worlds - we can return error values using the function throwError
:
throwError :: e -> ExceptT e m a
and we can "lift" functions that return a value of the underlying monadic type m
to return
a value of ExceptT e m a
instead:
lift :: m a -> ExceptT e m a
for example:
getLine :: IO String
lift getLine :: ExceptT e IO String
Actually,
lift
is also a type class function fromMonadTrans
, the type class of monad transformers. So technicallylift getLine :: MonadTrans t => t IO String
, but we are specializing for concreteness.
Now, if we had:
readFile :: FilePath -> ExceptT IOError IO String
writeFile :: FilePath -> String -> ExceptT IOError IO ()
We could compose them again without issue:
readFile "input.txt" >>= writeFile "ouptut.html"
But remember - the error type e
(in both the case Either
and Except
)
must be the same between composed functions! This means that the type representing
errors for both readFile
and writeFile
must be the same - that would also
force anyone using these functions to handle these errors - should a user who
called writeFile
be required to handle a "file not found" error? Should a user
who called readFile
be required to handle an "out of disk space" error?
There are many many more possible IO errors! "network unreachable", "out of memory",
"cancelled thread", we cannot require a user to handle all these errors, or
even cover them all in a data type.
So what do we do?
We give up on this approach for IO code, and use a different one: Exceptions, as we'll see in the next chapter.
Note - when we stack
ExceptT
on top of a different type calledIdentity
that also implements theMonad
interface, we get a type that is exactly likeEither
calledExcept
(without theT
at the end). You might sometimes want to useExcept
instead ofEither
because it has a more appropriate name and better API for error handling thanEither
.
Exceptions
The Control.Exception
module provides us with the ability to
throw
exceptions from IO
code,
catch
Haskell exceptions in IO
code, and even convert them to IO (Either ...)
with the function try
:
throwIO :: Exception e => e -> IO a
catch
:: Exception e
=> IO a -- The computation to run
-> (e -> IO a) -- Handler to invoke if an exception is raised
-> IO a
try :: Exception e => IO a -> IO (Either e a)
The important part of these type signatures is the
Exception
type class. By making a type an instance of the Exception
type class, we can throw it
and catch it in IO
code:
{-# language LambdaCase #-}
import Control.Exception
import System.IO
data MyException
= ErrZero
| ErrOdd Int
deriving Show
instance Exception MyException
sayDiv2 :: Int -> IO ()
sayDiv2 n
| n == 0 = throwIO ErrZero
| n `mod` 2 /= 0 = throwIO (ErrOdd n)
| otherwise = print (n `div` 2)
main :: IO ()
main =
catch
( do
putStrLn "Going to print a number now."
sayDiv2 7
putStrLn "Did you like it?"
)
( \case
ErrZero ->
hPutStrLn stderr "Error: we don't support dividing zeroes for some reason"
ErrOdd n ->
hPutStrLn stderr ("Error: " <> show n <> " is odd and cannot be divided by 2")
)
Note: we are using two new things here: guards, and the
LambdaCase
language extension.
Guards as seen in
sayDiv2
are just a nicer syntax aroundif-then-else
expressions. Using guards we can have multipleif
branches and finally use theelse
branch by usingotherwise
. After each guard (|
) there's a condition, after the condition there's a=
and then the expression (the part afterthen
in anif
expression).LambdaCase as seen in
catch
, is just a syntactic sugar to save a few characters, instead of writing\e -> case e of
, we can write\case
. It requires enabling theLambdaCase
extension.Language extensions
Haskell is a standardized language. However, GHC provides extensions to the language - additional features that aren't covered in the 98 or 2010 standards of Haskell. Features such as syntactic extensions (like LambdaCase above), extensions to the type checker, and more.
These extensions can be added by adding
{-# language <extension-name> #-}
(thelanguage
part is case insensitive) to the top of a Haskell source file, or they can be set globally for an entire project by specifying them in the default-extensions section in the.cabal file
.The list of language extensions can be found in the GHC manual, feel free to browse it, but don't worry about trying to memorize all the extensions.
This example, of course, is an example that would work much better using Either
and separating
the division and printing à la 'functional core, imperative shell'. But as an example it works.
We have created a custom exception and handled it specifically outside an IO
block.
However, we have not handled exceptions that might be raised by putStrLn
.
What if, for example, for some reason we close the stdout
handle before this block:
main :: IO ()
main = do
hClose stdout
catch
( do
putStrLn "Going to print a number now."
sayDiv2 7
putStrLn "Did you like it?"
)
( \case
ErrZero ->
hPutStrLn stderr "Error: we don't support dividing zeroes for some reason"
ErrOdd n ->
hPutStrLn stderr ("Error: " <> show n <> " is odd and cannot be divided by 2")
)
Our program will crash with an error:
ghc: <stdout>: hFlush: illegal operation (handle is closed)
First, how do we know which exception we should handle? Some functions' documentation
include this, but unfortunately putStrLn
's does not. We could guess from the
list of instances
the Exception
type class has; I think
IOException
fits. Now, how can we handle this case as well? We can chain catches:
-- need to add these at the top
{-# language ScopedTypeVariables #-}
import GHC.IO.Exception (IOException(..))
main :: IO ()
main = do
hClose stdout
catch
( catch
( do
putStrLn "Going to print a number now."
sayDiv2 7
putStrLn "Did you like it?"
)
( \case
ErrZero ->
hPutStrLn stderr "Error: we don't support dividing zeroes for some reason"
ErrOdd n ->
hPutStrLn stderr ("Error: " <> show n <> " is odd and cannot be divided by 2")
)
)
( \(e :: IOException) ->
-- we can check if the error was an illegal operation on the stderr handle
if ioe_handle e /= Just stderr && ioe_type e /= IllegalOperation
then pure () -- we can't write to stderr because it is closed
else hPutStrLn stderr (displayException e)
)
We use the
ScopedTypeVariables
to be able to specify types inside let expressions, lambdas, pattern matching and more.
Or we could use the convenient function
catches
to pass a list of exception
handlers:
main :: IO ()
main = do
hClose stdout
catches
( do
putStrLn "Going to print a number now."
sayDiv2 7
putStrLn "Did you like it?"
)
[ Handler $ \case
ErrZero ->
hPutStrLn stderr "Error: we don't support dividing zeroes for some reason"
ErrOdd n ->
hPutStrLn stderr ("Error: " <> show n <> " is odd and cannot be divided by 2")
, Handler $ \(e :: IOException) ->
-- we can check if the error was an illegal operation on the stderr handle
if ioe_handle e /= Just stderr && ioe_type e /= IllegalOperation
then pure () -- we can't write to stderr because it is closed
else hPutStrLn stderr (displayException e)
]
As an aside,
Handler
uses a concept called existentially quantified types to hide inside it a function that takes an arbitrary type that implementsException
. This is why we can encode a seemingly heterogeneous list of functions that handle exceptions forcatches
to take as input. This pattern is rarely useful, but I've included it here to avoid confusion.
And if we wanted to catch any exception, we'd catch SomeException
:
main :: IO ()
main = do
hClose stdout
catch
( do
putStrLn "Going to print a number now."
sayDiv2 7
putStrLn "Did you like it?"
)
( \(SomeException e) ->
hPutStrLn stderr (show e)
)
This could also go in catches
as the last element in the list if we wanted specialized
handling for other scenarios.
A couple more functions worth knowing are
bracket
and finally
.
These functions can help us handle resource acquisition more safely when errors are present.
Summary
Exceptions are useful and often necessary when we work with IO
and want to make sure
our program is handling errors gracefully. They have an advantage over Either
in that
we can easily compose functions that may throw errors of different types, but also have
a disadvantage of not encoding types as return values, and therefore does not force us
to handle them.
For Haskell, the language designers have made a choice for us by designing IO
to
use exceptions instead of Either
. And this is what I would recommend for
handling your own effectful computations. However, I think that Either
is more
appropriate for uneffectful code, because it forces us to acknowledge and handle errors
(eventually) thus making our programs more robust. And also because we can only
catch exceptions in IO
code.
Lets code already!
This was a long info dump. Let's practice what we've learned. We want to:
- Create the output directory
- Grab all file names in a directory
- Filter them according to their extension
- Process .txt files
- Copy other files without modification
- Parse each text file, build an index of the result, convert the files to HTML, and write everything to the target directory
Note: I did not write this code immediately in the final form it was presented. It was an iterative process of writing code, refactoring, splitting functions, changing type signatures, and more. When solving a coding problem, start small and simple, do the thing that works, and refactor it when it makes sense and makes the code clearer and more modular. In Haskell we pride ourselves in our ability to refactor code and improve it over time, and that principle holds when writing new software as well!
New module
Let's create a new module, HsBlog.Directory
, which will be responsible for handling
directories and multiple files. From this module we will export the convertDirectory
and buildIndex
functions we've defined before.
-- | Process multiple files and convert directories
module HsBlog.Directory
( convertDirectory
, buildIndex
)
where
In this module we are going to use the
directory
and filepath
libraries to manipulate directories, files and filepaths.
We'll use the new abstractions we've learned, Traversable
and Monad
, and the concepts
and types we've learned about: Either
, IO
and exceptions.
For all of that, we need quite a few imports:
import qualified HsBlog.Markup as Markup
import qualified HsBlog.Html as Html
import HsBlog.Convert (convert, convertStructure)
import Data.List (partition)
import Data.Traversable (for)
import Control.Monad (void, when)
import System.IO (hPutStrLn, stderr)
import Control.Exception (catch, displayException, SomeException(..))
import System.Exit (exitFailure)
import System.FilePath
( takeExtension
, takeBaseName
, (<.>)
, (</>)
, takeFileName
)
import System.Directory
( createDirectory
, removeDirectoryRecursive
, listDirectory
, doesDirectoryExist
, copyFile
)
If you are unsure what a specific function we're using does, look it up at
Hoogle,
read the type signature and the documentation, and play around with it in ghci
.
Converting a directory
We can start by describing the high-level function convertDirectory
which
encapsulates many smaller functions, each responsible for doing a specific thing.
convertDirectory
is quite imperative looking, and looks like a different way to
describe the steps of completing our task:
-- | Copy files from one directory to another, converting '.txt' files to
-- '.html' files in the process. Recording unsuccessful reads and writes to stderr.
--
-- May throw an exception on output directory creation.
convertDirectory :: FilePath -> FilePath -> IO ()
convertDirectory inputDir outputDir = do
DirContents filesToProcess filesToCopy <- getDirFilesAndContent inputDir
createOutputDirectoryOrExit outputDir
let
outputHtmls = txtsToRenderedHtml filesToProcess
copyFiles outputDir filesToCopy
writeFiles outputDir outputHtmls
putStrLn "Done."
Here we trust that each IO
function handles errors responsibly,
and terminates the project when necessary.
Let's examine the steps in order.
getDirFilesAndContent
-- | The relevant directory content for our application
data DirContents
= DirContents
{ dcFilesToProcess :: [(FilePath, String)]
-- ^ File paths and their content
, dcFilesToCopy :: [FilePath]
-- ^ Other file paths, to be copied directly
}
-- | Returns the directory content
getDirFilesAndContent :: FilePath -> IO DirContents
getDirFilesAndContent
is responsible for providing the relevant files for processing --
both the ones we need to convert to markup (and their textual content) and other files we
might want to copy as-is (such as images and style-sheets).
-- | Returns the directory content
getDirFilesAndContent :: FilePath -> IO DirContents
getDirFilesAndContent inputDir = do
files <- map (inputDir </>) <$> listDirectory inputDir
let
(txtFiles, otherFiles) =
partition ((== ".txt") . takeExtension) files
txtFilesAndContent <-
applyIoOnList readFile txtFiles >>= filterAndReportFailures
pure $ DirContents
{ dcFilesToProcess = txtFilesAndContent
, dcFilesToCopy = otherFiles
}
This function does 4 important things:
- Lists all the files in the directory
- Splits the files into 2 groups according to their file extension
- Reads the contents of the .txt files and report when files fail to be read
- Returns the results. We've defined a data type to make the result content more obvious
Part (3) is a little bit more involved than the rest, let's explore it.
applyIoOnList
-- | Try to apply an IO function on a list of values, document successes and failures
applyIoOnList :: (a -> IO b) -> [a] -> IO [(a, Either String b)]
applyIoOnList action files = do
for files $ \file -> do
maybeContent <-
catch
(Right <$> action file)
( \(SomeException e) -> do
pure $ Left (displayException e)
)
pure (file, maybeContent)
applyIoOnList
is a higher order function that applies a particular IO
function
(in our case readFile
) on a list of things (in our case FilePath
s).
For each thing, it returns the thing itself along with the result of
applying the IO
function as an Either
, where the Left
side is a String
representation of an error if one occurred.
Notice how much the type of this function tells us about what it might do.
Because the types are polymorphic, there is nothing else to do with
the a
s other than apply them to the function, and nowhere to generate b
from other than the result of the function.
Note: when I first wrote this function, it was specialized to work only on
readFile
, take specifically[FilePath]
and returnIO [(FilePath, Either String String)]
. But after running into other use cases where I could use it (writeFiles
andcopyFiles
) I refactored out theaction
, the input type and the return type.
This function uses exceptions to catch any error that might be thrown, and encodes
both the failure and success cases in the type system using Either
, delaying
the handling of exceptions to the function caller while making sure it won't
be forgotten!
Next, let's look at the function that handles the errors by reporting and then filtering out all the cases that failed.
filterAndReportFailures
-- | Filter out unsuccessful operations on files and report errors to stderr.
filterAndReportFailures :: [(a, Either String b)] -> IO [(a, b)]
filterAndReportFailures =
foldMap $ \ (file, contentOrErr) ->
case contentOrErr of
Left err -> do
hPutStrLn stderr err
pure []
Right content ->
pure [(file, content)]
This code may seem a bit surprising - how come we can use foldMap
here? Reminder,
the type of foldMap
is:
foldMap :: (Foldable t, Monoid m) => (a -> m) -> t a -> m
If we specialize this function for our use case, substituting the general type
with the types we are using, we learn that IO [(a, b)]
is a monoid.
And indeed - [a]
is a monoid for any a
with []
(the empty list) as mempty
and ++
as <>
, but also IO a
is a monoid for any a
that is itself
a monoid with pure mempty
as mempty
and liftA2 (<>)
as <>
!
Using these instances, we can map
over the content, handle errors, and return
an empty list to filter out a failed case, or a singleton list to keep the result.
And the fold
in foldMap
will concatenate the resulting list where we return
all of the successful cases!
These functions are responsible for fetching the right information. Next, let's look at the code for creating a new directory.
createOutputDirectoryOrExit
-- | Creates an output directory or terminates the program
createOutputDirectoryOrExit :: FilePath -> IO ()
createOutputDirectoryOrExit outputDir =
whenIO
(not <$> createOutputDirectory outputDir)
(hPutStrLn stderr "Cancelled." *> exitFailure)
-- | Creates the output directory.
-- Returns whether the directory was created or not.
createOutputDirectory :: FilePath -> IO Bool
createOutputDirectory dir = do
dirExists <- doesDirectoryExist dir
create <-
if dirExists
then do
override <- confirm "Output directory exists. Override?"
when override (removeDirectoryRecursive dir)
pure override
else
pure True
when create (createDirectory dir)
pure create
createOutputDirectoryOrExit
itself is not terribly exciting, it does
what it is named -- it tries to create the output directory, and exits the
program in case it didn't succeed.
createOutputDirectory
is the function that actually does the heavy lifting.
It checks if the directory already exists, and checks if the user would like to
override it. If they do, we remove it and create the new directory; if they don't,
we do nothing and report their decision.
txtsToRenderedHtml
let
outputHtmls = txtsToRenderedHtml filesToProcess
In this part of the code we convert files to markup and change the
input file paths to their respective output file paths (.txt
-> .html
).
We then build the index page, and convert everything to HTML.
-- | Convert text files to Markup, build an index, and render as html.
txtsToRenderedHtml :: [(FilePath, String)] -> [(FilePath, String)]
txtsToRenderedHtml txtFiles =
let
txtOutputFiles = map toOutputMarkupFile txtFiles
index = ("index.html", buildIndex txtOutputFiles)
in
map (fmap Html.render) (index : map convertFile txtOutputFiles)
toOutputMarkupFile :: (FilePath, String) -> (FilePath, Markup.Document)
toOutputMarkupFile (file, content) =
(takeBaseName file <.> "html", Markup.parse content)
convertFile :: (FilePath, Markup.Document) -> (FilePath, Html.Html)
convertFile (file, doc) = (file, convert file doc)
One possibly surprising thing about this code could be the map (fmap Html.render)
part. We can use fmap
on the tuple because it is a Functor
on the second
argument, just like Either
!
copyFiles
and writeFiles
The only thing left to do is to write the directory content, after the processing is completed, to the newly created directory:
-- | Copy files to a directory, recording errors to stderr.
copyFiles :: FilePath -> [FilePath] -> IO ()
copyFiles outputDir files = do
let
copyFromTo file = copyFile file (outputDir </> takeFileName file)
void $ applyIoOnList copyFromTo files >>= filterAndReportFailures
Here we use applyIoOnList
again to do something a bit more complicated,
instead of reading from a file, it copies from the input path to a newly generated
output path. Then we pass the result (which has the type [(FilePath, Either String ())]
)
to filterAndReportFailures
to print the errors and filter out the unsuccessful copies.
Because we are not really interested in the output of filterAndReportFailures
,
we discard it with void
, returning ()
as a result instead.
-- | Write files to a directory, recording errors to stderr.
writeFiles :: FilePath -> [(FilePath, String)] -> IO ()
writeFiles outputDir files = do
let
writeFileContent (file, content) =
writeFile (outputDir </> file) content
void $ applyIoOnList writeFileContent files >>= filterAndReportFailures
Once again, this code looks almost exactly like copyFiles
, but the types are different.
Haskell's combination of parametric polymorphism + type class for abstractions is really
powerful, and has helped us reduce quite a bit of code.
This pattern of using applyIoOnList
and then filterAndReportFailures
happens more than once. It might be a good candidate for refactoring. Try it!
What do you think about the resulting code? Is it easier or more difficult to
understand? Is it more modular or less? What are the pros and cons?
Summary
With that, we have completed our HsBlog.Directory
module that is responsible for converting
a directory safely. Note that the code could probably be simplified quite a bit if we
were fine with errors crashing the entire program altogether, but sometimes this is
the price we pay for robustness. It is up to you to choose what you can live with
and what not, but I hope this saga has taught you how to approach error handling
in Haskell in case you need to.
View the full module:
HsBlog.Directory
-- | Process multiple files and convert directories
module HsBlog.Directory
( convertDirectory
, buildIndex
)
where
import qualified HsBlog.Markup as Markup
import qualified HsBlog.Html as Html
import HsBlog.Convert (convert, convertStructure)
import Data.List (partition)
import Data.Traversable (for)
import Control.Monad (void, when)
import System.IO (hPutStrLn, stderr)
import Control.Exception (catch, displayException, SomeException(..))
import System.Exit (exitFailure)
import System.FilePath
( takeExtension
, takeBaseName
, (<.>)
, (</>)
, takeFileName
)
import System.Directory
( createDirectory
, removeDirectoryRecursive
, listDirectory
, doesDirectoryExist
, copyFile
)
-- | Copy files from one directory to another, converting '.txt' files to
-- '.html' files in the process. Recording unsuccessful reads and writes to stderr.
--
-- May throw an exception on output directory creation.
convertDirectory :: FilePath -> FilePath -> IO ()
convertDirectory inputDir outputDir = do
DirContents filesToProcess filesToCopy <- getDirFilesAndContent inputDir
createOutputDirectoryOrExit outputDir
let
outputHtmls = txtsToRenderedHtml filesToProcess
copyFiles outputDir filesToCopy
writeFiles outputDir outputHtmls
putStrLn "Done."
------------------------------------
-- * Read directory content
-- | Returns the directory content
getDirFilesAndContent :: FilePath -> IO DirContents
getDirFilesAndContent inputDir = do
files <- map (inputDir </>) <$> listDirectory inputDir
let
(txtFiles, otherFiles) =
partition ((== ".txt") . takeExtension) files
txtFilesAndContent <-
applyIoOnList readFile txtFiles >>= filterAndReportFailures
pure $ DirContents
{ dcFilesToProcess = txtFilesAndContent
, dcFilesToCopy = otherFiles
}
-- | The relevant directory content for our application
data DirContents
= DirContents
{ dcFilesToProcess :: [(FilePath, String)]
-- ^ File paths and their content
, dcFilesToCopy :: [FilePath]
-- ^ Other file paths, to be copied directly
}
------------------------------------
-- * Build index page
buildIndex :: [(FilePath, Markup.Document)] -> Html.Html
buildIndex files =
let
previews =
map
( \(file, doc) ->
case doc of
Markup.Heading 1 heading : article ->
Html.h_ 3 (Html.link_ file (Html.txt_ heading))
<> foldMap convertStructure (take 2 article)
<> Html.p_ (Html.link_ file (Html.txt_ "..."))
_ ->
Html.h_ 3 (Html.link_ file (Html.txt_ file))
)
files
in
Html.html_
"Blog"
( Html.h1_ (Html.link_ "index.html" (Html.txt_ "Blog"))
<> Html.h_ 2 (Html.txt_ "Posts")
<> mconcat previews
)
------------------------------------
-- * Conversion
-- | Convert text files to Markup, build an index, and render as html.
txtsToRenderedHtml :: [(FilePath, String)] -> [(FilePath, String)]
txtsToRenderedHtml txtFiles =
let
txtOutputFiles = map toOutputMarkupFile txtFiles
index = ("index.html", buildIndex txtOutputFiles)
in
map (fmap Html.render) (index : map convertFile txtOutputFiles)
toOutputMarkupFile :: (FilePath, String) -> (FilePath, Markup.Document)
toOutputMarkupFile (file, content) =
(takeBaseName file <.> "html", Markup.parse content)
convertFile :: (FilePath, Markup.Document) -> (FilePath, Html.Html)
convertFile (file, doc) = (file, convert file doc)
------------------------------------
-- * Output to directory
-- | Creates an output directory or terminates the program
createOutputDirectoryOrExit :: FilePath -> IO ()
createOutputDirectoryOrExit outputDir =
whenIO
(not <$> createOutputDirectory outputDir)
(hPutStrLn stderr "Cancelled." *> exitFailure)
-- | Creates the output directory.
-- Returns whether the directory was created or not.
createOutputDirectory :: FilePath -> IO Bool
createOutputDirectory dir = do
dirExists <- doesDirectoryExist dir
create <-
if dirExists
then do
override <- confirm "Output directory exists. Override?"
when override (removeDirectoryRecursive dir)
pure override
else
pure True
when create (createDirectory dir)
pure create
-- | Copy files to a directory, recording errors to stderr.
copyFiles :: FilePath -> [FilePath] -> IO ()
copyFiles outputDir files = do
let
copyFromTo file = copyFile file (outputDir </> takeFileName file)
void $ applyIoOnList copyFromTo files >>= filterAndReportFailures
-- | Write files to a directory, recording errors to stderr.
writeFiles :: FilePath -> [(FilePath, String)] -> IO ()
writeFiles outputDir files = do
let
writeFileContent (file, content) =
writeFile (outputDir </> file) content
void $ applyIoOnList writeFileContent files >>= filterAndReportFailures
------------------------------------
-- * IO work and handling errors
-- | Try to apply an IO function on a list of values, document successes and failures
applyIoOnList :: (a -> IO b) -> [a] -> IO [(a, Either String b)]
applyIoOnList action files = do
for files $ \file -> do
maybeContent <-
catch
(Right <$> action file)
( \(SomeException e) -> do
pure $ Left (displayException e)
)
pure (file, maybeContent)
-- | Filter out unsuccessful operations on files and report errors to stderr.
filterAndReportFailures :: [(a, Either String b)] -> IO [(a, b)]
filterAndReportFailures =
foldMap $ \ (file, contentOrErr) ->
case contentOrErr of
Left err -> do
hPutStrLn stderr err
pure []
Right content ->
pure [(file, content)]
------------------------------------
-- * Utilities
confirm :: String -> IO Bool
confirm question = do
putStrLn (question <> " (y/n)")
answer <- getLine
case answer of
"y" -> pure True
"n" -> pure False
_ -> do
putStrLn "Invalid response. Use y or n."
confirm question
whenIO :: IO Bool -> IO () -> IO ()
whenIO cond action = do
result <- cond
if result
then action
else pure ()
Summary
This was quite a section. Let's recount the things we've learned.
We discussed several ways to handle errors in Haskell:
- Encoding errors as a data type and using the
Either
type to encode "a value or an error". Useful approach for uneffectful code. - Using
ExceptT
when we want to combine the approach in (1) on top on an existing type with monadic capabilities. - Using exceptions for IO code.
We've also learned a few new abstractions and techniques:
- The
Traversable
type class, for data structures that can be traversed from left to right such as linked lists, binary trees andMap
s. Pretty useful when combined with another applicative functor type likeEither
orIO
. - The
Monad
type class extends theApplicative
type class with thejoin :: m (m a) -> m a
function. We learned thatEither
implements this type class interface and so doesIO
. - The
MonadTrans
type class for monad transformers for types that take other monads as inputs and provide a monadic interface (>>=
, do notation, etc.) while combining both their capabilities. We saw how to stack anEither
-like monad transformer,ExceptT
, on top ofIO
.
We are almost done. Only a couple more things left to do with this project. Let's go!
You can view the git commit of the changes we've made and the code up until now.
Passing environment variables
We'd like to add some sort of an environment to keep general information on the blog for various processings, such as the blog name, stylesheet location, and so on.
Environment
We can represent our environment as a record data type and build it from user input. The user input can be from command-line arguments, a configuration file, or something else.
module HsBlog.Env where
data Env
= Env
{ eBlogName :: String
, eStylesheetPath :: FilePath
}
deriving Show
defaultEnv :: Env
defaultEnv = Env "My Blog" "style.css"
After filling this record with the requested information, we can pass it as input to any function that might need it. This is a simple approach that can definitely work for small projects. But sometimes when the project gets bigger and many nested functions need the same information, threading the environment can get tedious.
There is an alternative solution to threading the environment as input to functions,
and that is using the
ReaderT
type from the mtl
(or transformers
) package.
ReaderT
newType ReaderT r m a = ReaderT (r -> m a)
ReaderT
is another monad transformer like ExceptT
, which means
that it also has an instance of Functor
, Applicative
, Monad
and MonadTrans
.
As we can see in the definition, ReaderT
is a newtype over a function that takes
some value of type r
, and returns a value of type m a
. The r
usually
represents the environment we want to share between functions that we want to compose,
and the m a
represents the underlying result that we return.
The m
could be any type that implements Monad
that we are familiar with.
Usually it goes well with IO
or Identity
, depending on if we want to share
an environment between effectful or uneffectful computations.
ReaderT
carries a value of type r
and passes it around to
other functions when we use the Applicative
and Monad
interfaces so that
we don't have to pass the value around manually. And when we want to grab
the r
and use it, all we have to do is ask
.
For our case, this means that instead of passing around Env
, we can instead
convert our functions to use ReaderT
- those that are uneffectful and don't use
IO
, can return ReaderT Env Identity a
instead of a
(or the simplified version, Reader Env a
),
and those that are effectful can return ReaderT Env IO a
instead of IO a
.
Note, as we've said before, Functor
, Applicative
, and Monad
all expect the type
that implements their interfaces to have the kind * -> *
.
This means that it is ReaderT r m
which implements these interfaces,
and when we compose functions with <*>
or >>=
we replace the f
or m
in their type signature with ReaderT r m
.
This means that, as with Either e
when we had composed functions with the same error type,
so it is with ReaderT r m
- we have to compose functions with the same r
type and same
m
type, we can't mix different environment types or different underlying m
types.
We're going to use a specialized version of ReaderT
that uses a specific m
= Identity
called Reader
.
The Control.Monad.Reader
provides an alias: Reader r a = ReaderT r Identity a
.
If the idea behind
ReaderT
is still a bit fuzzy to you and you want to get a better understanding of howReaderT
works, try doing the following exercise:
- Choose an
Applicative
orMonad
interface function, I recommendliftA2
, and specialize its type signature by replacingf
(orm
) with a concreteReaderT
type such asReaderT Int IO
.- Unpack the
ReaderT
newtype, replacingReaderT Int IO t
withInt -> IO t
.- Implement this specialized version of the function you've chosen
Solution for liftA2
liftA2 :: Applicative f => (a -> b -> c) -> f a -> f b -> f c
Solution for (1)
-- Specialize: replace `f` with `ReaderT Env IO` liftA2 :: (a -> b -> c) -> ReaderT Env IO a -> ReaderT Env IO b -> ReaderT Env IO c
Solution for (2)
-- Unpack the newtype, replacing `ReaderT Env IO a` with `Env -> IO a` liftA2 :: (a -> b -> c) -> (Env -> IO a) -> (Env -> IO b) -> (Env -> IO c)
Solution for (3)
specialLiftA2 :: (a -> b -> c) -> (Env -> IO a) -> (Env -> IO b) -> (Env -> IO c) specialLiftA2 combine funcA funcB env = liftA2 combine (funcA env) (funcB env)
Notice how the job of our special
liftA2
forReaderT
is to supply the two functions withenv
, and then use theliftA2
implementation of the underlyingm
type (in our caseIO
) to do the rest of the work. Does it look like we're adding a capability on top of a differentm
? That's the idea behind monad transformers.
How to use Reader
Defining a function
Instead of defining a function like this:
txtsToRenderedHtml :: Env -> [(FilePath, String)] -> [(FilePath, String)]
We define it like this:
txtsToRenderedHtml :: [(FilePath, String)] -> Reader Env [(FilePath, String)]
Now that our code uses Reader
, we have to accommodate that in the way we write our functions.
Before:
txtsToRenderedHtml :: Env -> [(FilePath, String)] -> [(FilePath, String)]
txtsToRenderedHtml env txtFiles =
let
txtOutputFiles = map toOutputMarkupFile txtFiles
index = ("index.html", buildIndex env txtOutputFiles)
htmlPages = map (convertFile env) txtOutputFiles
in
map (fmap Html.render) (index : htmlPages)
Note how we needed to thread the env
to the other functions that use it.
After:
txtsToRenderedHtml :: [(FilePath, String)] -> Reader Env [(FilePath, String)]
txtsToRenderedHtml txtFiles = do
let
txtOutputFiles = map toOutputMarkupFile txtFiles
index <- (,) "index.html" <$> buildIndex txtOutputFiles
htmlPages <- traverse convertFile txtOutputFiles
pure $ map (fmap Html.render) (index : htmlPages)
Note how we use do notation now, and instead of threading env
around we compose
the relevant functions, buildIndex
and convertFile
, we use the type classes
interfaces to compose the functions. Note how we needed to fmap
over buildIndex
to add the output file we needed with the tuple, and how we needed to use traverse
instead
of map
to compose the various Reader
values convertFile
will produce.
Extracting Env
When we want to use our Env
, we need to extract it from the Reader
.
We can do it with:
ask :: ReaderT r m r
Which yanks the r
from the Reader
- we can extract with >>=
or <-
in do notation.
See the comparison:
Before:
convertFile :: Env -> (FilePath, Markup.Document) -> (FilePath, Html.Html)
convertFile env (file, doc) =
(file, convert env (takeBaseName file) doc)
After:
convertFile :: (FilePath, Markup.Document) -> Reader Env (FilePath, Html.Html)
convertFile (file, doc) = do
env <- ask
pure (file, convert env (takeBaseName file) doc)
Note: we didn't change
convert
to useReader
because it is a user facing API for our library. By providing a simpler interface we allow more users to use our library - even those that aren't yet familiar with monad transformers.Providing a simple function argument passing interface is preferred in this case.
Run a Reader
Similar to handling the errors with Either
, at some point we need to supply the environment to
a computation that uses Reader
, and extract the result from the computation.
We can do that with the functions runReader
and runReaderT
:
runReader :: Reader r a -> (r -> a)
runReaderT :: ReaderT r m a -> (r -> m a)
These functions convert a Reader
or ReaderT
to a function that takes r
.
Then we can pass the initial environment to that function:
convertDirectory :: Env -> FilePath -> FilePath -> IO ()
convertDirectory env inputDir outputDir = do
DirContents filesToProcess filesToCopy <- getDirFilesAndContent inputDir
createOutputDirectoryOrExit outputDir
let
outputHtmls = runReader (txtsToRenderedHtml filesToProcess) env
copyFiles outputDir filesToCopy
writeFiles outputDir outputHtmls
putStrLn "Done."
See the let outputHtmls
part.
Extra: Transforming Env
for a particular call
Sometimes we may want to modify the Env
we pass to a particular function call.
For example, we may have a general Env
type that contains a lot of information, and
functions that only need a part of that information.
If the functions we are calling are like convert
and take the environment as an
argument instead of a Reader
, we can just extract the environment
with ask
, apply a function to the extracted environment,
and pass the result to the function, like this:
outer :: Reader BigEnv MyResult
outer = do
env <- ask
pure (inner (extractSmallEnv env))
inner :: SmallEnv -> MyResult
inner = ...
extractSmallEnv :: BigEnv -> SmallEnv
extractSmallEnv = ...
But if inner
uses a Reader SmallEnv
instead of argument passing,
we can use runReader
to convert inner
to a normal function,
and use the same idea as above!
outer :: Reader BigEnv MyResult
outer = do
env <- ask
-- Here the type of `runReader inner` is `SmallEnv -> MyResult`
pure (runReader inner (extractSmallEnv env))
inner :: Reader SmallEnv MyResult
inner = ...
extractSmallEnv :: BigEnv -> SmallEnv
extractSmallEnv = ...
This pattern is generalized and captured by a function called
withReaderT,
and works even for ReaderT
:
withReaderT :: (env2 -> env1) -> ReaderT env1 m a -> ReaderT env2 m a
withReaderT
takes a function that modifies the environment,
and converts a ReaderT env1 m a
computation to a ReaderT env2 m a
computation
using this function.
Let's see it concretely with our example:
outer :: Reader BigEnv MyResult
outer = withReaderT extractSmallEnv inner
Question: what is the type of withReaderT
when specialized in our case?
Answer
withReaderT
:: (BigEnv -> SmallEnv) -- This is the type of `extractSmallEnv`
-> Reader SmallEnv MyResult -- This is the type of `inner`
-> Reader BigEnv MyResult -- This is the type of `outer`
Note the order of the environments! We use a function from a BigEnv
to a SmallEnv
,
to convert a Reader
of SmallEnv
to a Reader
of BigEnv
!
This is because we are mapping over the input of a function rather than the output, and is related to topics like variance and covariance, but isn't terribly important for us at the moment.
Using Env
in our logic code
One thing we haven't talked about yet is using our environment in the convert
function to generate the pages we want. And actually, we don't even have the ability to add
stylesheets to our HTML EDSL at the moment! We need to go back and extend it. Let's do all
that now:
Since stylesheets go in the head
element, perhaps it's a good idea to create an additional
newtype
like Structure
for head
information? Things like title, stylesheet,
and even meta elements can be composed together just like we did for Structure
to build the head
!
-
Do it now: extend our HTML library to include headings and add 3 functions:
title_
for titles,stylesheet_
for stylesheets, andmeta_
for meta elements like twitter cards.Solution
src/HsBlog/Html.hs
-- Html.hs module HsBlog.Html ( Html , Head , title_ , stylesheet_ , meta_ , Structure , html_ , p_ , h_ , ul_ , ol_ , code_ , Content , txt_ , img_ , link_ , b_ , i_ , render ) where import Prelude hiding (head) import HsBlog.Html.Internal
src/HsBlog/Html/Internal.hs
newtype Head = Head String -- * EDSL html_ :: Head -> Structure -> Html html_ (Head head) content = Html ( el "html" ( el "head" head <> el "body" (getStructureString content) ) ) -- * Head title_ :: String -> Head title_ = Head . el "title" . escape stylesheet_ :: FilePath -> Head stylesheet_ path = Head $ "<link rel=\"stylesheet\" type=\"text/css\" href=\"" <> escape path <> "\">" meta_ :: String -> String -> Head meta_ name content = Head $ "<meta name=\"" <> escape name <> "\" content=\"" <> escape content <> "\">" instance Semigroup Head where (<>) (Head h1) (Head h2) = Head (h1 <> h2) instance Monoid Head where mempty = Head ""
-
Fix
convert
andbuildIndex
to use the new API. Note:buildIndex
should returnReader
!Solution
src/HsBlog/Convert.hs
import Prelude hiding (head) import HsBlog.Env (Env(..)) convert :: Env -> String -> Markup.Document -> Html.Html convert env title doc = let head = Html.title_ (eBlogName env <> " - " <> title) <> Html.stylesheet_ (eStylesheetPath env) article = foldMap convertStructure doc websiteTitle = Html.h_ 1 (Html.link_ "index.html" $ Html.txt_ $ eBlogName env) body = websiteTitle <> article in Html.html_ head body
src/HsBlog/Directory.hs
buildIndex :: [(FilePath, Markup.Document)] -> Reader Env Html.Html buildIndex files = do env <- ask let previews = map ( \(file, doc) -> case doc of Markup.Head 1 head : article -> Html.h_ 3 (Html.link_ file (Html.txt_ head)) <> foldMap convertStructure (take 2 article) <> Html.p_ (Html.link_ file (Html.txt_ "...")) _ -> Html.h_ 3 (Html.link_ file (Html.txt_ file)) ) files pure $ Html.html_ ( Html.title_ (eBlogName env) <> Html.stylesheet_ (eStylesheetPath env) ) ( Html.h_ 1 (Html.link_ "index.html" (Html.txt_ "Blog")) <> Html.h_ 2 (Html.txt_ "Posts") <> mconcat previews )
-
Create a command-line parser for
Env
, attach it to theconvert-dir
command, and pass the result it to theconvertDirectory
function.Solution
src/HsBlog.hs
import HsBlog.Env (defaultEnv) convertSingle :: String -> Handle -> Handle -> IO () process :: String -> String -> String process title = Html.render . convert defaultEnv title . Markup.parse
app/OptParse.hs
import HsBlog.Env ------------------------------------------------ -- * Our command-line options model -- | Model data Options = ConvertSingle SingleInput SingleOutput | ConvertDir FilePath FilePath Env deriving Show ------------------------------------------------ -- * Directory conversion parser pConvertDir :: Parser Options pConvertDir = ConvertDir <$> pInputDir <*> pOutputDir <*> pEnv -- | Parser for blog environment pEnv :: Parser Env pEnv = Env <$> pBlogName <*> pStylesheet -- | Blog name parser pBlogName :: Parser String pBlogName = strOption ( long "name" <> short 'N' <> metavar "STRING" <> help "Blog name" <> value (eBlogName defaultEnv) <> showDefault ) -- | Stylesheet parser pStylesheet :: Parser String pStylesheet = strOption ( long "style" <> short 'S' <> metavar "FILE" <> help "Stylesheet filename" <> value (eStylesheetPath defaultEnv) <> showDefault )
app/Main.hs
main :: IO () main = do options <- parse case options of ConvertDir input output env -> HsBlog.convertDirectory env input output ...
Summary
Which version do you like better? Manually passing arguments, or using Reader
?
To me, it is not clear that the second version with Reader
is better than the first
with explicit argument passing in our particular case.
Using Reader
and ReaderT
makes our code a little less friendly toward beginners
that are not yet familiar with these concepts and techniques, and we don't see
(in this case) much benefit.
As programs grow larger, techniques like using Reader
become more attractive to use.
For our relatively small example, using Reader
might not be appropriate.
I've included it in this book because it is an important technique to have in our
arsenal and I wanted to demonstrate it.
It is important to weigh the benefits and costs of using advanced techniques, and it's often better to try and get away with simpler techniques if we can.
You can view the git commit of the changes we've made and the code up until now.
Testing
We want to add some tests to our blog generator. At the very least a few regression tests to make sure that if we extend or change our markup parsing code, HTML generation code, or translation from markup to HTML code, and make a mistake, we'll have a safety net alerting us of issues.
We will use the hspec testing framework to write our tests.
There are other testing frameworks in Haskell, for example
tasty, but I like hspec
's documentation,
so we'll use that.
Initial setup
Cabal file additions
We're going to define a new section in our hs-blog-gen.cabal
file for our new test suite.
This section is called test-suite
and it is fairly similar to the library
and
executable
sections.
The interfaces for how to define a test suite are described in the
Cabal documentation.
We are going to use the exitcode-stdio-1.0
interface. Let's go over the different settings
and options:
test-suite hs-blog-gen-test
import: common-settings
type: exitcode-stdio-1.0
hs-source-dirs: test
main-is: Spec.hs
-- other-modules:
build-depends:
base
, hspec
, hspec-discover
, raw-strings-qq
, hs-blog
ghc-options:
-O -threaded -rtsopts -with-rtsopts=-N
build-tool-depends:
hspec-discover:hspec-discover
hs-source-dirs: test
- The directory of the source file for the test suitemain-is: Spec.hs
- The entry point to the test suite source fileother-modules
- The modules in our test suite. Currently commented out because we haven't added any yet.build-depends
- The packages we are going to use:base
- The standard library for Haskell, as we've used beforehspec
- The test framework we are going to usehspec-discover
- Automatic discovery of hspec testsraw-strings-qq
- Additional syntax for writing raw string literalshs-blog
- Our library
ghc-options
- Extra options and flags for GHC-O
- Compile with optimizations-threaded
- Use the multi-core runtime instead of single-core runtime. The multi-core runtime is generally a bit slower in my experience, but when writing code that actually uses multiple cores (such as a test framework that runs tests in parallel) it can gives a good performance boost.-rtsopts
- lets us manipulate the Haskell runtime system by passing command-line arguments to our application-with-rtsopts=-N
- Set specific default options for the program at link-time. Specifically,-N
Sets the number of cores to use in our program.
build-tool-depends
- Uses a specific executable from a package dependency in aid of building the package. In this case, we are using thehspec-discover
executable from thehspec-discover
package, which goes over the source directory for the tests, finds all of theSpec
files, and creates an entry point for the program that will run all the tests it discovered.
Hspec discovery
In order for hspec-discover
to work, we need to add the following
to the "main" file of the test suite, for us this is test/Spec.hs
:
{-# OPTIONS_GHC -F -pgmF hspec-discover #-}
That's it! hspec-discover
will automatically define a main
for us.
Now we can run the tests using stack test
or cabal v2-test
(your choice).
Because we haven't defined any tests, our output is:
Finished in 0.0000 seconds
0 examples, 0 failures
When we add new hspec tests, hspec-discover
will find and run them automatically
(though we will still need add them to the other-modules
section in the cabal file).
For hspec-discover
to identify modules as test modules, the modules must follow
a convention:
- Their module names must end with
Spec
- They must define a value
spec :: Spec
(which describes the test) and export it outside of the module (by adding it to the export list of the module, for example).
Writing tests
Let's write our first test. We'll create a new module to test
markup parsing. We'll call it MarkupParsingSpec.hs
. We'll need
the following imports as well:
module MarkupParsingSpec where
import Test.Hspec
import HsBlog.Markup
hspec
provides us with a monadic interface for describing, composing and
nesting test specifications (Spec
s).
Using the describe
function we can
describe a group of tests, using the it
function we can add a new test,
and using a function like shouldBe
we can compare two values and make
sure they are equal by using their Eq
instance.
If they are, the test will pass, and if not, it will fail with a descriptive error.
Let's try it and write a test that obviously fails!
spec :: Spec
spec = do
describe "Markup parsing tests" $ do
it "empty" $
shouldBe
(parse "")
[Heading 1 "bug"]
After adding the module to the other-modules
list in the cabal file:
other-modules:
MarkupParsingSpec
And running the tests, we get this output:
MarkupParsing
Markup parsing tests
empty FAILED [1]
Failures:
test/MarkupParsingSpec.hs:10:7:
1) MarkupParsing, Markup parsing tests, empty
expected: [Heading 1 "bug"]
but got: []
To rerun use: --match "/MarkupParsing/Markup parsing tests/empty/"
Randomized with seed 763489823
Finished in 0.0004 seconds
1 example, 1 failure
The output describes which tests are running in a hierarchy tree (module, group and test), whether the tests pass or fail, and if they fail, the output and the expected output.
We can fix our test by matching the expected output:
shouldBe
(parse "")
[]
Now, running the tests will produce:
MarkupParsing
Markup parsing tests
empty
Finished in 0.0001 seconds
1 example, 0 failures
We can add a few more tests:
it "paragraph" $
shouldBe
(parse "hello world")
[Paragraph "hello world"]
it "heading 1" $
shouldBe
(parse "* Heading 1")
[Heading 1 "Heading 1"]
it "code" $
shouldBe
(parse "> main = putStrLn \"hello world!\"")
[CodeBlock ["main = putStrLn \"hello world!\""]]
And run the tests again:
MarkupParsing
Markup parsing tests
Test empty
paragraph
heading 1
code
Finished in 0.0003 seconds
4 examples, 0 failures
This is the gist of writing unit tests with hspec
. It's important to note
that we can nest Spec
s that are declared with describe
to create trees,
and of course refactor and move things to different functions and modules
to make our test suite better organized.
For example, we can write our tests like this:
spec :: Spec
spec = do
describe "Markup parsing tests" $ do
simple
simple :: Spec
simple = do
describe "simple" $ do
it "empty" $
shouldBe
(parse "")
[]
it "paragraph" $
shouldBe
(parse "hello world")
[Paragraph "hello world"]
it "heading 1" $
shouldBe
(parse "* Heading 1")
[Heading 1 "Heading 1"]
it "code" $
shouldBe
(parse "> main = putStrLn \"hello world!\"")
[CodeBlock ["main = putStrLn \"hello world!\""]]
Also, there are other "expectations" like shouldBe
that we can use when writing tests.
They are described in the hspec tutorial
and can be found in the
haddock documentation as well.
Raw strings
If we want to write multi-line strings, or avoid escaping strings like we did in the "code"
test, we can use a library called
raw-strings-qq
which uses a language extension called
QuasiQuotes
.
QuasiQuotes
is a meta-programming extension that provides a mechanism for extending the
syntax of Haskell.
A quasi-quote has the form [quoter| string |]
, where the quoter is the name
of the function providing the syntax we wish to use, and the string is our input.
In our case, we use the quoter r
, which is defined in
raw-strings-qq,
and write any string we want, with multi-lines and unescaped characters!
We could use this to write the tests
we previously wrote:
{-# language QuasiQuotes #-}
...
import Text.RawString.QQ
...
example3 :: String
example3 = [r|
Remember that multiple lines with no separation
are grouped together to a single paragraph
but list items remain separate.
# Item 1 of a list
# Item 2 of the same list
|]
And add multi-line tests:
spec :: Spec
spec = do
describe "Markup parsing tests" $ do
simple
multiline
multiline :: Spec
multiline = do
describe "Multi-line tests" $ do
it "example3" $
shouldBe
(parse example3)
example3Result
example3 :: String
example3 = [r|
Remember that multiple lines with no separation
are grouped together to a single paragraph
but list items remain separate.
# Item 1 of a list
# Item 2 of the same list
|]
example3Result :: Document
example3Result =
[ Paragraph "Remember that multiple lines with no separation are grouped together to a single paragraph but list items remain separate."
, OrderedList
[ "Item 1 of a list"
, "Item 2 of the same list"
]
]
Running the tests:
MarkupParsing
Markup parsing tests
simple
Test empty
paragraph
heading 1
code
Multi-line tests
example3
Finished in 0.0004 seconds
5 examples, 0 failures
Exercise: Add a test for the fourth example described in the previous exercises.
Solution
multiline :: Spec
multiline = do
describe "Multi-line tests" $ do
it "example3" $
shouldBe
(parse example3)
example3Result
it "example4" $
shouldBe
(parse example4)
example4Result
example4 :: String
example4 = [r|
* Compiling programs with ghc
Running ghc invokes the Glasgow Haskell Compiler (GHC),
and can be used to compile Haskell modules and programs into native
executables and libraries.
Create a new Haskell source file named hello.hs, and write
the following code in it:
> main = putStrLn "Hello, Haskell!"
Now, we can compile the program by invoking ghc with the file name:
> ➜ ghc hello.hs
> [1 of 1] Compiling Main ( hello.hs, hello.o )
> Linking hello ...
GHC created the following files:
- hello.hi - Haskell interface file
- hello.o - Object file, the output of the compiler before linking
- hello (or hello.exe on Microsoft Windows) - A native runnable executable.
GHC will produce an executable when the source file satisfies both conditions:
# Defines the main function in the source file
# Defines the module name to be Main, or does not have a module declaration
Otherwise, it will only produce the .o and .hi files.
|]
example4Result :: Document
example4Result =
[ Heading 1 "Compiling programs with ghc"
, Paragraph "Running ghc invokes the Glasgow Haskell Compiler (GHC), and can be used to compile Haskell modules and programs into native executables and libraries."
, Paragraph "Create a new Haskell source file named hello.hs, and write the following code in it:"
, CodeBlock
[ "main = putStrLn \"Hello, Haskell!\""
]
, Paragraph "Now, we can compile the program by invoking ghc with the file name:"
, CodeBlock
[ "➜ ghc hello.hs"
, "[1 of 1] Compiling Main ( hello.hs, hello.o )"
, "Linking hello ..."
]
, Paragraph "GHC created the following files:"
, UnorderedList
[ "hello.hi - Haskell interface file"
, "hello.o - Object file, the output of the compiler before linking"
, "hello (or hello.exe on Microsoft Windows) - A native runnable executable."
]
, Paragraph "GHC will produce an executable when the source file satisfies both conditions:"
, OrderedList
[ "Defines the main function in the source file"
, "Defines the module name to be Main, or does not have a module declaration"
]
, Paragraph "Otherwise, it will only produce the .o and .hi files."
]
Summary
This chapter has been just the tip of the iceberg of the Haskell testing landscape. We haven't talked about property testing or golden testing, testing expected failures, testing IO code, inspection testing, benchmarking, and more. There's just too much to cover!
My hope is that this chapter provided you with the basics of how to start writing tests for your own projects. Please consult the tutorial for your chosen testing framework, and read more about this very important subject on your own.
You can view the git commit of the changes we've made and the code up until now.
Generating documentation
There are many ways to help others to get started with our projects and libraries. For example, we can write tutorials, provide runnable examples, describe the internals of the system, and create an API reference.
In this chapter we will focus on generating API reference pages (the kind that can be seen on Hackage) from annotated Haskell source code using Haddock.
Running Haddock
We can generate API reference pages (a.k.a. haddocks in the Haskell world) for our project using our favorite package manager:
Cabal
We can run cabal v2-haddock
to generate haddocks:
➜ cabal v2-haddock
Resolving dependencies...
Build profile: -w ghc-9.0.1 -O1
In order, the following will be built (use -v for more details):
- hs-blog-0.1.0.0 (lib) (first run)
Configuring library for hs-blog-0.1.0.0..
Preprocessing library for hs-blog-0.1.0.0..
Running Haddock on library for hs-blog-0.1.0.0..
Haddock coverage:
0% ( 0 / 3) in 'HsBlog.Env'
Missing documentation for:
Module header
Env (src/HsBlog/Env.hs:3)
defaultEnv (src/HsBlog/Env.hs:10)
21% ( 7 / 33) in 'HsBlog.Html.Internal'
Missing documentation for:
Module header
Html (src/HsBlog/Html/Internal.hs:8)
...
Documentation created:
/tmp/learn-haskell-blog-generator/dist-newstyle/build/x86_64-linux/ghc-9.0.1/hs-blog-0.1.0.0/doc/html/hs-blog/index.html
Cabal and Haddock will build our project and generate HTML pages for us at:
./dist-newstyle/build/<platform>/<compiler>/<package>-<version>/doc/html/<package>/
We can then open the index.html
file from that directory in a web browser and view our package documentation.
Stack
We can run stack haddock
to generate haddocks:
➜ stack haddock
...
hs-blog> build (lib + exe)
Preprocessing library for hs-blog-0.1.0.0..
Building library for hs-blog-0.1.0.0..
[1 of 7] Compiling HsBlog.Env
[2 of 7] Compiling HsBlog.Html.Internal
...
hs-blog> haddock
Preprocessing library for hs-blog-0.1.0.0..
Running Haddock on library for hs-blog-0.1.0.0..
Haddock coverage:
0% ( 0 / 3) in 'HsBlog.Env'
Missing documentation for:
Module header
Env (src/HsBlog/Env.hs:3)
defaultEnv (src/HsBlog/Env.hs:10)
21% ( 7 / 33) in 'HsBlog.Html.Internal'
Missing documentation for:
Module header
Html (src/HsBlog/Html/Internal.hs:8)
...
Documentation created:
.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.1.0/doc/html/hs-blog/index.html,
.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.1.0/doc/html/hs-blog/hs-blog.txt
Preprocessing executable 'hs-blog-gen' for hs-blog-0.1.0.0..
...
Stack and Haddock will build our project and generate HTML pages for us at:
./.stack-work/dist/<platform>/Cabal-<version>/doc/html/<package>/
We can then open the index.html
file from that directory in a web browser and view our package documentation.
Haddock coverage
Haddock will also output a coverage report when run, and will mention user-exposed constructs which are missing documentation. These constructs could be module headers, types, data constructors, type classes, functions, values, etc.
For example:
Haddock coverage:
...
0% ( 0 / 3) in 'HsBlog.Convert'
Missing documentation for:
Module header
convert (src/HsBlog/Convert.hs:8)
convertStructure (src/HsBlog/Convert.hs:23)
67% ( 2 / 3) in 'HsBlog.Directory'
Missing documentation for:
buildIndex (src/HsBlog/Directory.hs:80)
...
We can see that we did not document the HsBlog.Convert
at all, and we are missing
documentation for the module header, the convert
function and the convertStructure
function.
On the other hand, it seems that we do currently have some documentation written for the HsBlog.Directory
module! We'll see why, but first - try to generate haddocks, see the module hierarchy, browse around
the different modules, follow the links of the types, imagine what this API reference could look like,
and let's see how we can improve it.
Haddock markup
Haddock builds the API reference pages by building our project, examining the exported modules and their exported definitions, and grabbing source code comments written in special markup format.
Let's take a quick look at this markup format. We will go over a few important bits, but if you'd like to learn more, a complete guide for Haddock markup can be found in the Haddock documentation.
Documenting definitions
All haddock annotations appear as part of regular Haskell comments.
They can be used with both single line form (--
) and multi-line form ({-
and -}
).
The placement of a comment block and the haddock marker determine to which Haskell
definition the haddock string is attached.
We can annotate a Haskell definition by writing a comment block prefixed with |
before
the definition, or by writing a comment block prefixed with ^
after the definition.
For example:
-- | Construct an HTML page from a `Head`
-- and a `Structure`.
html_
:: Head -- ^ Represents the @\<head\>@ section in an HTML file
-> Structure -- ^ Represents the @\<body\>@ section in an HTML file
-> Html
html_ = ...
...
Here's another example:
{- | Represents a single markup structure. Such as:
- A paragraph
- An unordered list
- A code block
-}
data Structure
= Heading Natural String
-- ^ A section heading with a level
| Paragraph String
-- ^ A paragraph
| UnorderedList [String]
-- ^ An unordered list of strings
| OrderedList [String]
-- ^ An ordered list of strings
| CodeBlock [String]
-- ^ A code block
And another:
{- | Markup to HTML conversion module.
This module handles converting documents written in our custom
Markup language into HTML pages.
-}
module HsBlog.Convert where
As you can see, |
and ^
can be used to document functions, function arguments,
types, data constructors, modules, and more. They are probably the most important
Haddock annotations to remember (and even then, |
alone will suffice).
Tip: Annotate the modules, types, and the top-level definitions which are exported from your project with some high-level description of what they are used for (at the very least).
Your users and collaborators will thank you!
Section headings
We can separate our module into sections by adding headings.
Headings are comments which are prefixed with a number of *
(just like in our markup language).
For example:
-- * HTML EDSL
html_ :: Head -> Structure -> Html
html_ = ...
-- ** Structure
p_ :: Content -> Structure
p_ = ..
h_ :: Content -> Structure
h_ = ..
...
-- ** Content
txt_ :: String -> Content
txt_ = ...
link_ :: FilePath -> Content -> Content
link_ = ...
It is also possible to add headings to the export list instead:
module HsBlog.Html
( -- * HTML EDSL
Html
, html_
-- ** Combinators used to construct the @\<head\>@ section
, Head
, title_
, stylesheet_
, meta_
-- ** Combinators used to construct the @\<body\>@ section
, Structure
, p_
, h_
, ul_
, ol_
, code_
-- ** Combinators used to construct content inside structures
, Content
, txt_
, img_
, link_
, b_
, i_
-- ** Render HTML to String
, render
)
where
Separating parts of the module into sections helps keeping the important things together and Haddock will create a table-of-contents at the top of a module page for us as well.
Sometimes it's also easier to figure out whether a module should be split into multiple modules or not after splitting it into sections using headings.
Exercise: Try to re-arrange the modules in our project to your liking and add headings to sections.
Formatting
As we saw earlier, we can also add formatting in the content of our comments. For example, we can:
-
Hyperlink identifiers by surrounding them with
`
.For example:
`Heading`
-
Write
monospaced text
by surrounding it with@
.For example:
@Paragraph "Hello"@
-
Add emphasis to text by surrounding it with
/
.For example:
/this is emphasised/
-
Add bold to text by surrounding it with
__
.For example:
__this is bold__
More
In this chapter we've covered the basics of the Haddock markup language. If you'd like to know more, the Haddock markup guide contains information on how to create even more interesting documentation structures, such as code blocks, grid tables, images and examples.
Summary
We've briefly covered one aspect of documenting Haskell programs: using Haddock to generate informative API reference pages created from source code comments which are annotated with Haddock markup.
While API references are incredibly valuable, remember that there are other forms of documentation that can help your users get started quickly, such as examples and tutorials.
Exercise: Add haddock annotation to the top-level definitions in our project and test your understanding of the program and the various parts - sometimes the best way to learn something is to try explaining it!
Recap
In this book we've implemented a very simple static blog generator while learning Haskell as we go.
- We've learned about basic Haskell building blocks, such as definitions, functions, types, modules, recursion, pattern matching, type classes, IO and exceptions.
- We've learned about EDSLs and used the combinator pattern to implement a composable html generation library.
- We've learned how to leverage types, modules and smart constructors to make invalid states unrepresentable.
- We've learned how to represent complex data using ADTs.
- We've learned how to use pattern matching to transform ADTs, and how to use recursion to solve problems.
- We've used the functional core, imperative shell approach to build a program that handles IO and applies our domain logic to user inputs.
- We've learned about abstractions such as monoids, functors and monads, and how they can help us reuse code and convey information about shared interfaces.
- We've learned how to create fancy command-line interfaces, write tests, and generate documentation.
While Haskell is a very big and complex language, and there's always more to be learned, I think we've reached an important milestone where you can start building your own Haskell projects and be productive with Haskell!
This is a good time to celebrate and pat yourself on the back for getting this far! Great job, you!
If you'd like to learn even more about Haskell and continue your Haskell journey beyond this book, check out the appendix sections Where to go next and the FAQ.
Thank you!
Thank you for reading this book. I hope you enjoyed it and found Haskell interesting.
I would very much like to hear your feedback. If you'd like, you could leave your feedback at this book's GitHub issue board, or you could reach me directly on Twitter or via email. You can find my contact information on my website.
If you liked this book, do let me know - your kind words mean a lot.
Finally, if you really liked this book and would like to support future passion projects like it, you can support me directly via Ko-fi.
Thank you and good luck with your next Haskell project!
Where to go next
Haskell is an incredibly rich and deep programming language. New cutting-edge techniques, concepts and features are still being discovered and sometimes integrated into GHC. This sometimes makes it seemingly impossible to catch up to.
This phenomena is sometimes dubbed The Haskell pyramid. My hope is that by reading this book and following the exercises, you readers have reached the bar of productivity, and you can now go and start working on your own projects with Haskell. I highly encourage you to do so. In my opinion, writing useful Haskell projects is the best method to solidify what you currently know, and identify what you still need to learn.
Extending this project
If you'd like to extend this project, here are a few ideas for you:
- Serve over HTTP - You can use a web library such as wai or twain to serve this blog over HTTP instead of generating it statically
- Rewrite it with libraries - you could rewrite it and use a real-world HTML package and markdown parser
- Add features
- You could add a metadata block at the top of each article which would include the title, publish date and tags of a blog post, and use them when generating HTML, index page and even tags pages
- You could add HTML pages templating using mustache or similar, and use that to generate a nice and customizable structure to the page
- You could add support for links and images in our markup language parser
- You could add support for configuration files which would include things like the blog title, description, or other meta information for things like twitter cards
Or anything else you can think of, consider this project your playground and do whatever you like with it!
Other resources
At some point you are likely to run into new concepts, techniques, or even just a feeling of "I feel like this could be done better". I'd like to point you in the right direction so you can find additional information and learn new Haskell things when you need to or want to.
I've compiled a list of resources for learning Haskell called Haskell study plan, which includes links to very useful articles, community hubs and news aggregators, project suggestions, and even cool open-source Haskell projects. You will also find alternative explanations to thing we've covered and even links to other Haskell tutorials, guides and books in case you need a different view on things.
Also, the GHC User Guide is a fantastic resource with loads of articles and information about the language and GHC tooling around it. It is often the best place to learn about the Haskell language.
However, don't feel pressured to learn everything Haskell has to offer right away. Mastering Haskell is a journey that can take a lot of time. Most of us are definitely not there yet, but we can still be very productive with Haskell, build real-world projects, and even discover new techniques and concepts ourselves.
Remember that in a lazy language we evaluate things only when we need them. Maybe we can do that too with Haskell concepts!
Frequently asked questions
Got a question? You can ask in the book's issue tracker!
General questions
How to install editor tools
As far as I know, the most recommended setup today for Haskell development is using VSCode or VSCodium together with the marketplace Haskell extension.
The Haskell extension uses haskell-language-server which can be install via GHCup or even via the Haskell extension itself.
There are other options of course if this setup isn't your jam.
How to learn new things
The Haskell community keeps marching forward, developing new libraries, tools and techniques as well as creating new material for older concepts. The Haskell planetarium aggregates feeds from several communities into one page, as well as a Haskell Weekly newsletter. You might also find the quite a bit of Haskell presence on Twitter!
Debugging
How to debug Haskell code
Most imperative languages provide a step debugger. While the
GHCi debugger,
exists it is not particularly easy to use, especially because of Haskell's lazy evaluation where things
might not evaluated at the order we might intuitively expect. Because of that,
Haskellers tend to use
trace debugging and
equational reasoning. With trace debugging, we try to verify our assumptions about the code -
we use the various trace
functions as a "hack" to print variables, functions inputs, functions output
or even just say "got here", from anywhere at the code.
After finding something that does not match our assumptions, such as unexpected input or output
of a function, we try to think what piece of code could be responsible for the discrepancy, or even use
trace debugging again to pinpoint the exact location, and try to use "equational reasoning" to
evaluate the offending code that betrayed our expectations. If it's easy to do, we try running
the function in ghci
with different inputs to check our assumptions as well.
Because Haskell focuses on immutability, composibility and using types to eliminate many classes of possible errors, "local reasoning" becomes possible, and trace debugging becomes a viable strategy for debugging Haskell programs.
How to understand type errors
GHC type errors are often not the most friendly errors messages, but they mean well! They are just trying to help us find inconsistencies in our code - often with regards to type usage, they help us avoid making errors.
When you run into error messages, start by reading the messages themselves carefully until you get used to them, and then the offending code hinted by the error message. As you gain experience, it is likely that the most important part of an error will be the location of the offending code, and by reading the code we can find the error without the actual error message.
Adding type signatures and annotations to test your understanding of the types also helps greatly. We can even ask GHC for the expected type in a certain place by using typed holes.
My program is slow. Why?
There could be various reasons. From inefficient algorithms or
unsuited data structures for the task
in terms of time complexity of the common operations, to less efficient memory representations
(this is another reminder to use Text
over String
in most cases),
and laziness issues (again, the evaluation strategy!).
The performance section in my Haskell study plan links to various resources on Haskell evaluation, profiling and case studies.
Design
How to structure programs
Start with the imperative shell functional core approach, define EDSLs with the combinator pattern for logic if needed, use monadic capabilities such as State locally if needed, maybe add an environment configuration with ReaderT, see how it goes.
If that approach fails you, look at why it fails and examine other solutions according to your needs.
How to model data
Modeling data using ADTs are usually the way to go. Often programmers coming from object oriented background tend to look at type classes as a way to define methods similar to inheritance, but this often isn't the right approach and ADTs with different constructors for different alternatives go a long way. Remember that even OOP people often preach for composition over inheritance.
Use functions to define behavior on data rather than trying to couple the two together.