BF Compiler Part 1 - Parsing

Read Time: 12 minutes

So I thought to myself, what does the world need? Obviously another brainfuck (BF) compiler. In this series I will use F# and FParsec to compile BF source code into MSIL to run in Microsoft’s CLR. Honestly, this isn’t a particularly ground-breaking task, but it serves as a fun opportunity to showcase a popular parsing library. Beyond that, it shows how easy it is for F# to leverage various parts of the .NET ecosystem.

Since there are a couple subtopics, this process is best addressed as a series. This first post will address FParsec and parsing the BF source code. The goal is intentionally restricted, so I will have placeholders with no functionality for the Ops as they are parsed. In later posts these will be expanded upon. The second post addresses IL generation and program generation. The third post will put the pieces together into something interesting. In the end, I’ll have a BF compiler.

Series:
Part 1 (Parsing)
Part 2 (IL Generation)
Part 3 (Compiler)
Part 4 (Optimization)

Before I get into the code, here is a quick synopsis of the BF functionality that needs to be supported.

Constructs:
Memory (data[30000]) - Array of ints
Data Pointer (data_pointer) - Current pointer position of data array
Instruction Pointer (instruction_pointer) - Position of current program operation

Operators:
>   Increment data_pointer
<   Decrement data_pointer
+   Increment value at data[data_pointer]
-   Decrement value at data[data_pointer]
[   Loop Start - Start of loop where predicate is: data[data_pointer] == 0
]   Loop End - End of loop
,   Read char from stdin into value at data[data_pointer]
.   Print value at data[data_pointer] to stdout (as char)

Implementation Notes:
For those familar with BF, there are variants that auto-grow memory or wrap overflows/underflows of the pointers. To keep things simple, I will not implement those for this series.

Example program that prints: 42
+++++[>++++++++++<-]>++.--.

With that brief introduction, I can begin to address how I will parse a BF program. The language syntax is so simple, I could easily iterate and use F#’s match. Instead, I’ll use FParsec, a powerful parsing library modeled after the Haskell Parsec library. What I have to parse here won’t do its power justice, but it works as an easily digestible introduction. Now to the code.

Using Paket, here is a sample paket.dependencies file.

1
2
3
source https://nuget.org/api/v2

nuget FParsec

Time to load modules. Note the need for FParsecCS.dll and FParsec.dll, order of inclusion matters.

1
2
3
4
5
6
7
8
System.IO.Directory.SetCurrentDirectory(__SOURCE_DIRECTORY__)
#I "../packages/FParsec/lib/net40-client/"
#r "../packages/FParsec/lib/net40-client/FParsecCS.dll"
#r "../packages/FParsec/lib/net40-client/FParsec.dll"

open System
open System.IO
open FParsec

I leverage F#’s Discriminated Unions to represent the operators. In BF there are basically two types of operators. Those that do an immediate action (like increment, decrement, read/print char), and the while loop (a list of instructions to loop over). F# allows this description to be modeled easily.

1
2
3
4
type actionOps = Gt | Lt | Plus | Minus | Read | Write
type allOps =
| Op of actionOps
| Loop of allOps list

Next are the my parser specific types. UserState can be used to keep state as the parser progresses, and Parser<'t> is a returned value as well as current state.

1
2
type UserState = unit
type Parser<'t> = Parser<'t, UserState>

Before I go too much deeper, how does a parser work? Below is a quick example. Parsers can get pretty advanced, and my introduction here won’t cover nearly all of the details. I highly recommend checking out the FParsec Tutorial to learn more (although it won’t be necessary for this post). pGreater looks for the char ‘>’, when found it executes the attached closure. For illustrative purposes I added a print statement, but it ultimately returns the appropriate Op value.

1
2
3
4
let pGreater:Parser<_> = 
pchar '>' |>> fun x ->
printfn "found: %A" x
Op Gt

Here is a sample execution using run. It finds the ‘>’, prints it, and returns a successful parsing object.

1
2
3
> run pGreater ">>--"
found: '>'
val it : ParserResult<allOps,unit> = Success: Op Gt

Taking this a bit further, it only matched one of the ‘>’. To match multiple instances, I apply the many combinator. Note how this time it matches 2 ‘>’, and returns a list of its matches.

1
2
3
4
> run (many pGreater) ">>--"
found: '>'
found: '>'
val it : ParserResult<allOps list,unit> = Success: [Op Gt; Op Gt]

What happens if there isn’t a match? Notice how this isn’t currently a wildcard search, it starts parsing at the beginning of the string and fails if it’s a non-match.

1
2
3
4
5
6
7
> run pGreater "<>--"
val it : ParserResult<allOps,unit> =
Failure:
Error in Ln: 1 Col: 1
<>--
^
Expecting: '>'

Now I introduce a Parser for the ‘<’. I also want to to match multiple instances using many of either ‘>’ or ‘<’; this is done with the <|> operator. You can see the results of running this slightly more advanced search.

1
2
3
4
5
6
7
8
9
10
11
12
let pLesser:Parser<_> = 
pchar '<' |>> fun x ->
printfn "found: %A" x
Op Lt

> run (many (pLesser <|> pGreater)) "><<<--"
found: '>'
found: '<'
found: '<'
found: '<'
val it : ParserResult<allOps list,unit> =
Success: [Op Gt; Op Lt; Op Lt; Op Lt]

Going a bit further down the rabbithole. Temporarily I make a Letter op, just so the sample parse looks more logical. I make a letter parser that accepts an ‘a’ or ‘b’, this part isn’t new. Then I do a search for multiple a’s and b’s surrounded by angle brackets. The combinators >>. and .>> link a sequence of parsers and indicate which side of the parse to capture (hint, capture on the . side of the operator). In the below output you can see how each character is matched, but only the letters are returned in the parsing result, the brackets are discarded.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Letter is temporary for demonstration
type actionOps = Gt | Lt | Plus | Minus | Read | Write | Letter

let pLetter:Parser<_> =
pchar 'a' <|> pchar 'b' |>> fun x ->
printfn "found: %A" x
Op Letter

> run (pLesser >>. (many pLetter) .>> pGreater) "<aabb>"
found: '<'
found: 'a'
found: 'a'
found: 'b'
found: 'b'
found: '>'
val it : ParserResult<allOps list,unit> =
Success: [Op Letter; Op Letter; Op Letter; Op Letter]

I could go on for ages, but hopefully this is a basic enough start to how parsers work to get us through the post. Now to the BF parsers. Some of these will look familar. There is a character parser for each valid BF character/operator. For my purposes, its enough to convert the char to a specific type. This type information will be used later for generating the desired IL.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
let pGreater:Parser<_> = 
pchar '>' |>> fun _ -> Op Gt

let pLesser:Parser<_> =
pchar '<' |>> fun _ -> Op Lt

let pPlus:Parser<_> =
pchar '+' |>> fun _ -> Op Plus

let pMinus:Parser<_> =
pchar '-' |>> fun _ -> Op Minus

let pPeriod:Parser<_> =
pchar '.' |>> fun _ -> Op Write

let pComma:Parser<_> =
pchar ',' |>> fun _ -> Op Read

let pLBracket:Parser<_> =
pchar '[' |>> ignore

let pRBracket:Parser<_> =
pchar ']' |>> ignore

let pCommands = pGreater <|> pLesser <|> pPlus <|> pMinus <|> pPeriod <|> pComma

The action operators are pretty straight forward. Parsing loops is a bit more complicated. I need to not only parse blocks of operators, but allow embedded loops as well. This reference to something prior to being created causes a complication. What I effectively need is the ability to call loop recursively. Normal F# style would use let rec, but that isn’t an option here. So I leverage FParsec’s createParserForwardedToRef(). This allows me to use a parser pLoop as I normally would. I then have to inject the pLoop logic into the pLoopImpl reference. Implementing the parsing of a loop is basically find matching ‘[‘ and ‘]’, and capture any commands or embedded loops within them, recursively parsing the internal loops as necessary. Then return the result as a Loop.

1
2
3
4
let (pLoop:Parser<allOps>), (pLoopImpl:Parser<allOps> ref) = createParserForwardedToRef()

do pLoopImpl := pLBracket >>. (many (pCommands <|> pLoop)) .>> pRBracket |>> fun x ->
Loop x

There is one more parser to implement, the overarching program parser. As you can see, a program is just many instances of operators or loops.

1
let pProgram = (many (pCommands <|> pLoop))

Now that the parsing component is complete, I need a consumer of the resulting AST generated. Since BF is a list of operators, I iterate through the AST and handle each op as I approach it. Note, this is one of the placeholder functions I mentioned earlier. Here my “handle op” is just an sprintf the op (prefixing operators with ‘O:’ and wrapping loops in ‘[[…]]’). When the real compiler is built, this function will be replaced with something that emits IL.

1
2
3
4
5
6
7
8
let rec ProcessAst (ast:allOps) =
match ast with
| Op op -> sprintf "O:%A" op
| Loop loop ->
loop
|> List.map ProcessAst
|> List.fold (fun a b -> a + " " + b) ""
|> sprintf "[[%s]]"

The parseCode function pulls it all together. Given program source code, the function parses it and then processes the resulting AST. I also use a success/failure match in case the parsing fails. Again, this function will be expanded once I start emiting IL.

1
2
3
4
5
6
7
8
let parseCode code = 
let ast = run pProgram code
match ast with
| Success(x, _, _) ->
printfn "%A" x |> ignore
ProcessAst (Loop x)
| Failure(e, _, _) ->
sprintf "Error: %A" e

All of the pieces are in place, so it is time to look at processing program code. First, a look into the intermediate parsing step. run pProgram code takes code and returns the AST. Below it shows the operator parsing and the loops embedded in the list of resulting operators. Second, the psuedo-compile is done with parseCode. Again the operator parsing is shown as well, and the result is a a string of ops.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
let code = ">[++>+[-]+<]>."

// Example intermediate step
> run pProgram code

val it : ParserResult<allOps list,unit> =
Success: [Op Gt; Loop [Op Plus; Op Plus; Op Gt; Op Plus; Loop [Op Minus]; Op Plus; Op Lt];
Op Gt; Op Write]

// Example processed program
> parseCode code

[Op Gt; Loop [Op Plus; Op Plus; Op Gt; Op Plus; Loop [Op Minus]; Op Plus; Op Lt];
Op Gt; Op Write]
val it : string =
"[[ O:Gt [[ O:Plus O:Plus O:Gt O:Plus [[ O:Minus]] O:Plus O:Lt]] O:Gt O:Write]]"

I can now parse BF source code using Fparsec and have it stored in convenient structure for further processing. With phase one of this series complete, this is a good place to pause. Next will be looking at what it takes to generate IL in F#.