BF Compiler Part 3 - Compiler

Read Time: 18 minutes

This is the third installment in creating a brainfuck (BF) compiler using F# and FParsec. Previous posts discussed the parsing and the IL generation. Now it it time to pull it all together into something useful.

Series:
Part 1 (Parsing)
Part 2 (IL Generation)
Part 3 (Compiler)
Part 4 (Optimization)

Using Paket, here is a sample paket.dependencies file.

1
2
3
source https://nuget.org/api/v2

nuget FParsec

Here is the standard boilerplate for loading assemblies and generally getting ready to go.

1
2
3
4
5
6
7
8
9
10
11
System.IO.Directory.SetCurrentDirectory(__SOURCE_DIRECTORY__)
#r "../packages/FParsec/lib/net40-client/FParsecCS.dll"
#r "../packages/FParsec/lib/net40-client/FParsec.dll"

open System
open System.IO
open System.Runtime
open System.Reflection
open System.Reflection.Emit
open System.Text.RegularExpressions
open FParsec

I’ll start with the parsing code. I include it here for completeness sake, but there isn’t much new here. From a logic perspective, I added Comment support into the parser (basically any char that isn’t a valid character). I also wrapped the parsing code in a module for organizational purposes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
// Language operators (used in parsing and emitting)
type ActionOps = Gt | Lt | Plus | Minus | Read | Write
type AllOps =
| Op of ActionOps
| Loop of AllOps list
| Program of AllOps list
| Comment

///////////////
// Parsing code
module Parser =
type UserState = unit
type Parser<'t> = Parser<'t, UserState>

let pGreater:Parser<_> =
pchar '>' |>> fun _ -> Op Gt

let pLesser:Parser<_> =
pchar '<' |>> fun _ -> Op Lt

let pPlus:Parser<_> =
pchar '+' |>> fun _ -> Op Plus

let pMinus:Parser<_> =
pchar '-' |>> fun _ -> Op Minus

let pPeriod:Parser<_> =
pchar '.' |>> fun _ -> Op Write

let pComma:Parser<_> =
pchar ',' |>> fun _ -> Op Read

let pLBracket:Parser<_> =
pchar '[' |>> ignore

let pRBracket:Parser<_> =
pchar ']' |>> ignore

let pCommands = pGreater <|> pLesser <|> pPlus <|> pMinus <|> pPeriod <|> pComma

let (pLoop:Parser<AllOps>), (pLoopImpl:Parser<AllOps> ref) = createParserForwardedToRef()

do pLoopImpl := pLBracket >>. (many (pCommands <|> pLoop)) .>> pRBracket |>> fun x ->
Loop x

let commentChar c =
c <> '>' &&
c <> '<' &&
c <> '+' &&
c <> '-' &&
c <> '[' &&
c <> ']' &&
c <> '.' &&
c <> ','

let pComment:Parser<_> =
many1Satisfy commentChar |>> fun _ -> Comment

// Build ast from source code
let buildAst code =
run (many (pCommands <|> pLoop <|> pComment)) code

This is the start of the new IL emitting code. Sticking with the standard definition, the memory array will be of size 30,000. LocalBuilder is an IL local variable object. To protect me from myself, I leverage F# types to ensure that I’m using the correct variable in the correct place. For those not familar with F#, this will become more obvious where the variables are used. Simply, the type safety this brings is immensely useful and offers me a piece of mind. As a sidenote, BF defines the use of a program counter. I don’t actually need a program counter to keep track of where I am, so it’s excluded from my implementation. It’s here, but commented out primarily for reference-sake.

1
2
3
4
5
6
7
8
9
module Emitter = 

// Size of memory (data[])
let memorySize = 30000

// IL Variable types
type MemArray = MEM of LocalBuilder
type DpPointer = DP of LocalBuilder
//type PcPointer = PC of LocalBuilder

There will some variable creation along the way. To be consistent I create some functions to consistently implement that process. ilGenerator is basically the function the code is attached to. Creating a scalar is straight forward. Creating an array is a bit more involved. I need to create the variable, then call Newarr with the element type and array size to initialize the array. Once complete, I return the created variable

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Create IL local variable
let createLocal (ilGenerator:ILGenerator) (t:Type) =
ilGenerator.DeclareLocal(t)

// Create IL local array variable
let createLocalArray (ilGenerator:ILGenerator) (t:Type) (size:int) =
let regex = new Regex(@"\[\]$")
let tElement = Type.GetType(regex.Replace(t.FullName, ""))

let var = createLocal ilGenerator t
ilGenerator.Emit(OpCodes.Ldc_I4, size)
ilGenerator.Emit(OpCodes.Newarr, tElement)
ilGenerator.Emit(OpCodes.Stloc, var)
var

Both the add and sub perform similar actions, add a value to a local variable. As a quick refresher, there will be lots of pushing onto the stack, performing actions, and popping results off a stack. As a result, these functions follow a pattern that will show up repeatably in this post. LdLoc pushes a local variable value onto the stack. Ldc_I4 pushes a 32bit int onto the stack; in this case the amount to be added/subtracted to/from the local variable value. Add and Sub pop the top 2 values off the top of the stack, perform their respective actions, and push the result onto the stack. StLoc pops a value off the stack and stores it in the local variable.

1
2
3
4
5
6
7
8
9
10
11
12
13
// Add int to IL local variable
let addLocalInt (ilGenerator:ILGenerator) (var:LocalBuilder) (i:int) =
ilGenerator.Emit(OpCodes.Ldloc, var)
ilGenerator.Emit(OpCodes.Ldc_I4, i)
ilGenerator.Emit(OpCodes.Add)
ilGenerator.Emit(OpCodes.Stloc, var)

// subtract int from IL local variable
let subLocalInt (ilGenerator:ILGenerator) (var:LocalBuilder) (i:int) =
ilGenerator.Emit(OpCodes.Ldloc, var)
ilGenerator.Emit(OpCodes.Ldc_I4, i)
ilGenerator.Emit(OpCodes.Sub)
ilGenerator.Emit(OpCodes.Stloc, var)

The functions emitGt and emitLt increment and decrement the data pointer by leveraging the previously defined add/sub functions. This is also an important place to mention type safety (See, I said I’d get back to that). I can ensure that I don’t ever accidently send anything other than the data pointer into these functions. That can simply be done with a (dataPointer:DpPointer), but I want to do a bit more; I want to use the LocalBuilder value within the type. If you’re not familar with the syntax, (DP(dataPointer):DpPointer) effectively ensures that dataPointer is of type DpPointer, and extracts the LocalBuilder instance into the dataPointer variable. This is a nice construct to not only ensure I send in a LocalBuilder, but that I send in the correct LocalBuilder. The additional level of security is one of the strengths type safety brings to the table.

1
2
3
4
5
6
7
// Increment dataPointer
let emitGt (ilGenerator:ILGenerator) (DP(dataPointer):DpPointer) =
addLocalInt ilGenerator dataPointer 1

// Decrement dataPointer
let emitLt (ilGenerator:ILGenerator) (DP(dataPointer):DpPointer) =
subLocalInt ilGenerator dataPointer 1

emitPlus and emitMinus increment and decrement the value at data[dataPointer], respectively. This is similar to before, the primary difference is how to get/set an array value. Also note that again I use F# types to ensure I am only ever working with the data[] and dataPointer variables. To get a value from an array onto the stack I push the array variable reference on the stack Ldloc <array variable>, then the index Ldloc <index>, then a command to push the value at array[index] onto the stack Ldelem_I4. To set a value of an array the process is similar. Push the array reference, index, and value on the stack, then call a command to pop the top value of the stack into array[index], Stelem_I4. Beyond getting values in and out of arrays, I just have to do the appropriate add/subtract to the value before doing the set.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Increment value in data[dataPointer]
let emitPlus (ilGenerator:ILGenerator) (MEM(data):MemArray) (DP(dataPointer):DpPointer) =
// Load array reference for setting
ilGenerator.Emit(OpCodes.Ldloc, data)
ilGenerator.Emit(OpCodes.Ldloc, dataPointer)

// Load value of array
ilGenerator.Emit(OpCodes.Ldloc, data)
ilGenerator.Emit(OpCodes.Ldloc, dataPointer)
ilGenerator.Emit(OpCodes.Ldelem_I4)

// Add 1 to value
ilGenerator.Emit(OpCodes.Ldc_I4_1)
ilGenerator.Emit(OpCodes.Add)

// Save back into array
ilGenerator.Emit(OpCodes.Stelem_I4)

// Decrement value in data[dataPointer]
let emitMinus (ilGenerator:ILGenerator) (MEM(data):MemArray) (DP(dataPointer):DpPointer) =
// Load array reference for setting
ilGenerator.Emit(OpCodes.Ldloc, data)
ilGenerator.Emit(OpCodes.Ldloc, dataPointer)

// Load value of array
ilGenerator.Emit(OpCodes.Ldloc, data)
ilGenerator.Emit(OpCodes.Ldloc, dataPointer)
ilGenerator.Emit(OpCodes.Ldelem_I4)

// Subtract 1 to value
ilGenerator.Emit(OpCodes.Ldc_I4_1)
ilGenerator.Emit(OpCodes.Sub)

// Save back into array
ilGenerator.Emit(OpCodes.Stelem_I4)

Now it is time to tackle the IO portion of BF. emitComma reads a character from stdin and saves it into data[dataPointer]. To do this, I need to read the keypress from the console and convert it to a char. In F# I would do let (keypressChar:char) = Console.ReadKey().KeyChar. Using straight IL, I call the ReadKey and get_KeyChar functions. After that, it is a matter of saving the value into the data array as an int. emitPeriod displays the value at data[dataPointer] to stdout as a char. There isn’t a lot new here, push the value from the array onto the stack, then call Console.Write. Of possible interest here is the parameter type char. If I make the parameter type be typeof<int>, it would instead print the actual int value in the data array. Although not what I want as final output, its a useful note for debugging.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Read char from stdin into data[dataPointer]
let emitComma (ilGenerator:ILGenerator) (MEM(data):MemArray) (DP(dataPointer):DpPointer) =
// Read keypress and extract char value
let consoleReadKey = createLocal ilGenerator typeof<System.ConsoleKeyInfo>
ilGenerator.EmitCall(OpCodes.Call, typeof<Console>.GetMethod("ReadKey", [| |]), [| typeof<ConsoleKeyInfo> |])
ilGenerator.Emit(OpCodes.Stloc, consoleReadKey)

let keypressChar = createLocal ilGenerator typeof<char>
ilGenerator.Emit(OpCodes.Ldloca_S, consoleReadKey)
ilGenerator.EmitCall(OpCodes.Call, typeof<ConsoleKeyInfo>.GetMethod("get_KeyChar", [| |]), [| typeof<char> |])
ilGenerator.Emit(OpCodes.Stloc, keypressChar)

// Save into data[dataPointer]
ilGenerator.Emit(OpCodes.Ldloc, data)
ilGenerator.Emit(OpCodes.Ldloc, dataPointer)
ilGenerator.Emit(OpCodes.Ldloc, keypressChar)
ilGenerator.Emit(OpCodes.Stelem_I4)

// Print char at data[dataPointer] to stdout
let emitPeriod (ilGenerator:ILGenerator) (MEM(data):MemArray) (DP(dataPointer):DpPointer) =
ilGenerator.Emit(OpCodes.Ldloc, data)
ilGenerator.Emit(OpCodes.Ldloc, dataPointer)
ilGenerator.Emit(OpCodes.Ldelem_I4)
ilGenerator.EmitCall(OpCodes.Call,typeof<Console>.GetMethod("Write", [| typeof<char> |]), null);

Now that all the ActionOps are implemented, I make a generic emitOp function. There isn’t anything particularly exciting here.

1
2
3
4
5
6
7
8
9
// Generate IL for an operator
let emitOp (ilGenerator:ILGenerator) (op:ActionOps) (data:MemArray) (dataPointer:DpPointer) =
match op with
| Gt -> emitGt ilGenerator dataPointer
| Lt -> emitLt ilGenerator dataPointer
| Plus -> emitPlus ilGenerator data dataPointer
| Minus -> emitMinus ilGenerator data dataPointer
| Read -> emitComma ilGenerator data dataPointer
| Write -> emitPeriod ilGenerator data dataPointer

Time to implement loop blocks, program, and ast processor. These are mutually recursive functions. For those new to F#, this is accomplished by let rec foo = <foo code> and bar <bar code> syntax. It is a necessary construct when foo needs to call bar, and vice versa.

The loop is a list of operators, with an associated predicate to determine if the list should continue to execute. To do this the loop leverages labels for jumping between sections of code (the predicate section and the end of loop). This is done in a two step process. One, call DefineLabel() to create the label, this allows it to be referenced. Two, MarkLabel says where in the code the label belongs. As you can imagine, I put the predicate label at the begining of the predicate, and the loop-end label after the loop’s block of code.

Tackling the predicate logic; the looping works as follows, “Continue loop as long as data[dataPointer] != 0”. To test this, I push the value from the array onto the stack, then a 0. Ceq does the compare, pushing the result onto the stack and Brtrue branches out of the loop if the value equals 0.

Once the branching logic is complete, all that’s left is processing the list of ops. This is a good segue to emitOpList. Both a loop and program are a list of operators. This is simply a matter of iterating over the list and feeding each one into processAst.

Even though processAst is the controlling function for the emitting of the AST -> IL, it looks pretty tame to everything else so far. But that’s because the hard stuff has already been done. processAst provides a match that handles the 3 types of operators; and their respective functions take care of the rest of the work.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
  // Generate IL for a loop
let rec emitLoop (ilGenerator:ILGenerator) (MEM(data):MemArray) (DP(dataPointer):DpPointer) (block:AllOps list) =
let ilLoopPredicate = ilGenerator.DefineLabel()
let ilLoopStart = ilGenerator.DefineLabel()
let ilLoopEnd = ilGenerator.DefineLabel()

////////////
// predicate

// push data[dataPointer]
ilGenerator.MarkLabel(ilLoopPredicate)
ilGenerator.Emit(OpCodes.Ldloc, data)
ilGenerator.Emit(OpCodes.Ldloc, dataPointer)
ilGenerator.Emit(OpCodes.Ldelem_I4)
// push 0
ilGenerator.Emit(OpCodes.Ldc_I4, 0)

// if true (data[dataPointer]==0), exit loop
ilGenerator.Emit(OpCodes.Ceq)
ilGenerator.Emit(OpCodes.Brtrue, ilLoopEnd)

/////////////
// loop block
ilGenerator.MarkLabel(ilLoopStart)
emitOpList ilGenerator (MEM data) (DP dataPointer) block

//////////////
// End of loop
ilGenerator.Emit(OpCodes.Br, ilLoopPredicate)
ilGenerator.MarkLabel(ilLoopEnd)

// Generate IL for list of Ops
and emitOpList (ilGenerator:ILGenerator) (data:MemArray) (dataPointer:DpPointer) (ast:AllOps list) =
ast |> List.iter (processAst ilGenerator data dataPointer)

// Generate IL for ast
and processAst (ilGenerator:ILGenerator) (data:MemArray) (dataPointer:DpPointer) (ast:AllOps) =
match ast with
| Op op -> emitOp ilGenerator op data dataPointer
| Loop block -> emitLoop ilGenerator data dataPointer block
| Program block -> emitOpList ilGenerator data dataPointer block
| Comment -> () // do nothing

Taking a breather from all that new stuff, I head back into familiar territory. This is the code from my second post in the series; emitting an IL program. The only change of significance is that I now insert the BF logic. This consists of creating the necessary variables and generating IL based on the AST. First, the variables: data, dataPointer, and programCounter. As mentioned above I don’t need programCounter, so it’s here for reference only. I use my previously defined “create variable” functions for this. I also wrap each variable in it’s respective type (MEM, DP, PC). This ensures that as I use them, I don’t send the wrong variable to the wrong function. Second, the IL generation with processAst.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// Remove non-alphanumeric, to avoid object naming issues
let cleanName name =
let regEx = new Regex("[^A-Z0-9]", RegexOptions.IgnoreCase)
regEx.Replace(name, "")

// Create and save exe, emits IL inside application "shell"
let createProgram (appName:string) (ast:AllOps list) =

let appName' = cleanName appName
let exeName = sprintf "%s.exe" appName

let appDomain = AppDomain.CurrentDomain

let assemblyName = new AssemblyName()
assemblyName.Name <- appName'

let assembly = appDomain.DefineDynamicAssembly(assemblyName, AssemblyBuilderAccess.Save)

let programModule = assembly.DefineDynamicModule(appName', exeName)

let appClass = sprintf "%s.Program" appName'
let programType = programModule.DefineType(appClass, TypeAttributes.Public)

let mainMethod = programType.DefineMethod("Main", MethodAttributes.Static)

// Define starting method for assembly
assembly.SetEntryPoint(mainMethod)

let mainIl = mainMethod.GetILGenerator()

// Contents of function: Main

// IL Variable initialization
let data = MEM(createLocalArray mainIl typeof<int[]> memorySize)
let dataPointer = DP(createLocal mainIl typeof<int>)
//let programCounter = PC(createLocal mainIl typeof<int>)

processAst mainIl data dataPointer (Program ast)

mainIl.Emit(OpCodes.Ret)

// Creates the Foo.Program class
programType.CreateType() |> ignore

// Save exe
assembly.Save(exeName)

Tying the parser and emitter is done in the compile function. FParsec provides a Success/Failure when parsing a string. In this case, if success I emit a compiled program, otherwise I display the parsing error. I also filter out the comments prior to sending it to emitter.

1
2
3
4
5
6
module Compiler = 
let compile appName code =
let ast = Parser.buildAst code
match ast with
| Success(x, _, _) -> Emitter.createProgram appName (x |> List.filter (fun x' -> x' <> Comment))
| Failure(e, _, _) -> printfn "Error: %A" e

Here is a compilation of a Hello World program.

1
2
3
4
let programName = "HelloWorld"
let code = "++++++++++++[>++++++>++++++++>+++<<<-]>.>+++++.+++++++..+++.>----.<<+++++++++++++++.>.+++.------.--------.>+."

Compiler.compile programName code

Quite exciting, it worked!

Hello World

Taking it a bit further, I can compile bf files from the commandline. To support this I remove the programName and code variables, and replace them with the below block. This will compile all *.bf files provided as arguments.

1
2
3
4
5
Environment.GetCommandLineArgs()   
|> Array.filter (fun (x:string) -> x.EndsWith(".bf"))
|> Array.iter (fun x ->
printfn "compiling %s ..." x
Compiler.compile x (File.ReadAllText x))

The primary goal is complete. I can now compile BF to run in the CLR. There are a couple directions I could follow at this point. Next time, going deeper down the rabbit hole: optimization.