BF Compiler Part 2 - MSIL

Read Time: 7 minutes

Continuing with the BF compiler, it’s time to look at how to create code targeting the CLR. As before, I will be using F# to generate the target MSIL.

Series:
Part 1 (Parsing)
Part 2 (IL Generation)
Part 3 (Compiler)
Part 4 (Optimization)

There are a couple aspects regarding IL generation. First, the how. System.Reflection and System.Reflection.Emit are the namespaces that contain the functionality of interest. Second, the what. A basic application shell is a good place to start. Once that’s in place, I’ll discuss the third part, the what inside the what. F# can be used to to emit IL into the sample application for expanded functionality.

The ultimate goal is it compile BF into MSIL. Before doing the fun stuff, it’s a good idea to step back and see what is involved in creating a basic application. Without any of the fancy things, what is the bare minimum I need to get something that runs. To do that my short-term target is to get a basic application that executes. Once this is in place I can start looking at custom IL code. The base application that I generate has the general structure:

  • Application Domain

    • Assembly

      • Module

        • Class (Program)

          • Method (Main)

            • Code for main

The application generation code is short, so I’m just going to put it all together below. For now I bootstrap AppDomain creation by just using the current domain. Then I create the assembly and module. When I define the main class Program I want it to be in the Foo namespace. This is done by using the fully qualified name in the DefineType call. For the sample, I will have just one method, Main. If I wanted to create multiple methods in the Program class, I could make additional MethodBuilder instances attached to programType. Now that Main is created, I make that the entry point for the assembly. GetILGenerator is a glimpse of things to come. This is how the IL creation happens. Since the generator is attached to the MethodBuilder for Main, the code is injected into that method definition. For the example it is a simple WriteLine and return. Now that the Program class is completed, I create it. All that is left is to write the code to the file. Well, that was easy.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
System.IO.Directory.SetCurrentDirectory(__SOURCE_DIRECTORY__)
open System
open System.Reflection
open System.Reflection.Emit

let programName = "Foo"
let exeName = sprintf "%s.exe" programName

let appDomain = AppDomain.CurrentDomain

let assemblyName = new AssemblyName()
assemblyName.Name <- "Foo"

let assembly = appDomain.DefineDynamicAssembly(assemblyName, AssemblyBuilderAccess.Save)

let programModule = assembly.DefineDynamicModule("Foo", exeName)

let programType = programModule.DefineType("Foo.Program", TypeAttributes.Public)

let mainMethod = programType.DefineMethod("Main", MethodAttributes.Static)

// Define starting method for assembly
assembly.SetEntryPoint(mainMethod)

let mainIl = mainMethod.GetILGenerator()

// Contents of function: Main
mainIl.EmitWriteLine("A small program")
mainIl.Emit(OpCodes.Ret)

// Creates the Foo.Program class
programType.CreateType()

// Save exe
assembly.Save(exeName)

And here is the program running. It’s not much to look at, but it is good for a starting framework.

Running the executable

Now that I’ve seen it run, it’s time to take a look into what was generated. For this I use JetBrains’ dotPeek decompiler. Below is a screen shot of the decompilation. The application is about as minimal as it gets, but the code is as expected. This is also pretty close to a bare C# console application. It has a Program class in the Foo namespace. There is a Main function with the WriteLine. Looks like things have worked as planned.

Decompiled Foo.exe

Time to take things up a small notch. I now want to add 5 + 37 and display the results. Before I get started I’m going to include a reference to the IL OpCodes. I also want to mention, this isn’t meant to be a deep dive into the CLR internals; it’s just a basic starter. But you should fine if you know that the you push things onto a stack to use them, and they get popped off as they are used. You also have access to local variables for more “persistent” storage. Remember those old Assembler classes from school? It’s kind of like that.

All of the following code will be inserted between the existing lines mainIl.EmitWriteLine("A small program") and mainIl.Emit(OpCodes.Ret) of the above code. The goal is to insert additional functionality to the Main method call.

First, push the numbers 5 and 37 onto the stack. Second, add the top two values of the stack (5 & 37) and push the result onto the stack (42).

1
2
3
mainIl.Emit(OpCodes.Ldc_I4, 5)
mainIl.Emit(OpCodes.Ldc_I4, 37)
mainIl.Emit(OpCodes.Add)

Important to remember, when something uses a value off the stack, it’s popped and gone forever. I want to do two things with my resulting 42, so I’ll emit a Dup call. Now I have 2 42s at the top of my stack.

1
mainIl.Emit(OpCodes.Dup)

Now for the result, I’m going to pop the first 42 off the stack and print it’s integer representation. Then I’ll pop the second 42 off the stack and print it’s ASCII char representation (decimal 42 = *). When printing, the distinction is made by parameter type of the call. To parse the emit in more detail, what is happening? EmitCall(OpCodes.Call: going to make a function call. typeof<Console>.GetMethod("Write", [| typeof<int> |]): The function to call is Console.Write, which has 1 input parameter (an int in this case). null: the call has no output parameters. This matches with my understanding of the call when used in C#. Since the call takes 1 parameter, it will pop 1 value off the stack to meet it’s needs. This is the general pattern for function calling. You’ll see more of this in future posts.

1
2
3
4
5
// Print the numeric value on the top of the stack (42)
mainIl.EmitCall(OpCodes.Call, typeof<Console>.GetMethod("Write", [| typeof<int> |]), null);

// Print the char representation of the value on the top of the stack ('*')
mainIl.EmitCall(OpCodes.Call, typeof<Console>.GetMethod("Write", [| typeof<char> |]), null);

Take two, now with more awesomeness.

Running the executable

Taking a look at this decompiled version, the results are a bit more interesting. The middle panel shows the C# representation, 5 + 37 into a variable, then two Console.Writes, one with a char cast. The right panel shows the IL, which looks remarkably like what I emitted. Again, it looks like things are working as planned.

Decompiled Foo.exe

This concludes part 2 of the series. I can now parse BF source code, generate IL, and create an exe. Next time I start putting these pieces together into something more interesting.