Team LiB   Previous Section   Next Section

7.2 Basics

IMCC's main purpose is assembling PASM or PIR source files. It can run them immediately or generate a Parrot bytecode file for running later.

Internally, IMCC works a little differently with PASM and PIR source code, so each has different restrictions. The default is to run in a "mixed" mode that allows PASM code to mix with the higher-level syntax unique to PIR.

A file with a .pasm extension is treated as pure PASM code, as is any file run with the -a command-line option. These files can use macros,[1] but none of PIR's syntax. This mode is mainly used for running pure PASM tests that were originally written for assemble.pl.

[1] The only macro that works within PIR code is .include.

The documentation that comes with IMCC in languages/imcc/docs/ and the test suite in languages/imcc/t are good starting points for digging deeper into its syntax and functionality.

7.2.1 Statements

The syntax of statements in PIR is much more flexible than PASM. All PASM opcodes are valid PIR code, so the basic syntax is still an opcode followed by its arguments:

print "He's pining for the fjords.\n"

The statement delimiter is a newline \n, just like PASM, so each statement has to be on its own line. Any statement can start with a label.

LABEL: print I1

But unlike PASM, PIR has some higher-level constructs, including symbol operators:

I1 = 5

named variables:

count = 5

and complex statements built from multiple keywords and symbol operators:

if I1 <= 5 goto LABEL

We'll get into these in more detail as we go.

7.2.2 Comments

Comments are marked by a hash sign (#). Commented lines are counted but otherwise ignored, just like empty lines.

I1 = 5 # assign '5'

7.2.3 Variables and Constants

Constants in PIR are the same as constants in PASM. Integers and floating-point numbers are numeric literals:

print 42       # integer constant

print 0x2A     # hexadecimal integer

print 0b1101   # binary integer

print 3.14159  # floating point constant

print 1.e6     # scientific notation

Strings are enclosed in quotes:

print "fjord"

These can use the standard escape sequences, like \t (tab), \n (newline), \r (return), \f (form feed), \\ (literal slash), \" (literal double quote), etc. The one difference from PASM strings is that in PIR strings the NULL character must be escaped as \x00:

print "Binary\x00nul embedded"
7.2.3.1 PASM registers

PIR code has a variety of ways to store values while you work with them. The most basic way is to use Parrot registers directly. Parrot register names always start with a single character that shows whether it is an integer, numeric, string, or PMC register, and end with the number of the register (between 0 and 31):

set S0, "Hello, Polly.\n"

print S0

end

This example is plain PASM syntax, but you can also use PASM registers in PIR code.

When you work directly with PASM registers, you can only have 32 registers of any one type at a time.[2] If you have more than that, you have to start shuffling stored values on and off the user stack. You also have to manually track when it's safe to reuse a register. This kind of low-level access to the Parrot registers is handy when you need it, but it's pretty unwieldy for large sections of code.

[2] Only 31 for PMC registers, because P31 is reserved for spilling.

7.2.3.2 Temporary registers

IMCC provides an easier way to work with Parrot registers. The temporary register variables are named like the PASM registers—with a single character for the type of register and a number—but they start with a $ character:

set $S42, "Hello, Polly.\n"

print $S42

end

The most obvious difference between PASM registers and temporary register variables is that you have an unlimited number of temporaries. IMCC handles register allocation for you. It keeps track of how long a value in a Parrot register is needed and when that register can be reused.

The previous example used the $S42 temporary. When the code is compiled, that temporary is allocated to the Parrot register S0. As long as that temporary is needed, it is stored in S0. When it's no longer needed, S0 is re-allocated to some other value:

$S42 = "Hello, "

print $S42

$S43 = "Polly.\n"

print $S43

end

This example uses two temporary string registers. Since they don't overlap, both will be allocated to the S0 register. If you change the order a little so both temporaries are needed at the same time, they're allocated to different registers:

$S42 = "Hello, "  # allocated to S1

$S43 = "Polly.\n" # allocated to S0

print $S42

print $S43

end

In this case, $S42 is allocated to S1 and $S43 is allocated to S0.

IMCC allocates temporary registers[3] to Parrot registers in ascending order of their score. The score is based on a number of factors related to variable usage. Variables used in a loop have a higher score than variables outside a loop. Variables that span a long range have a lower score than ones that are used only briefly.

[3] As well as named variables, which we'll talk about next.

If you want to peek behind the curtain and see how IMCC is allocating registers, you can run it with the -d switch to turn on debugging output.

$ imcc -d1000 hello.imc

If hello.imc is the first example above, it produces this output:

code_size(ops) 11  oldsize 0

0 set_s_sc 0 1  set S0, "Hello, "

3 print_s 0     print S0

5 set_s_sc 0 0  set S0, "Polly.\n"

8 print_s 0     print S0

10 end  end

Hello, Polly.

That's probably a lot more information than you wanted if you're just starting out. You can also generate a PASM file with the -o switch and have a look at how the PIR code translates:

$ imcc -o hello.pasm hello.imc

You'll find more details on these options and many others in Section 7.5 later in this chapter.

7.2.3.3 Named variables

Named variables can be used anywhere a register or temporary register is used. They're declared with the .local statement or the equivalent .sym statement, which require a variable type and a name:

.local string hello

set hello, "Hello, Polly.\n"

print hello

end

This example defines a string variable named hello, assigns it the value "Hello, Polly.\n", and then prints the value.

The valid types are string, int, float, and any Parrot class name (like PerlInt or PerlString). It should come as no surprise that these are the same divisions as Parrot's four register types. IMCC allocates named variables to Parrot registers the same way it allocates temporary register variables.

The name of a variable must be a valid PIR identifier. It can contain letters, digits, and underscores, but the first character has to be a letter or underscore. Identifiers don't have any limit on length yet, but it's a safe bet they will before the production release.

7.2.3.4 Parrot classes

Any integer, floating-point number, or string can be replaced by an equivalent Parrot class:

P0 = new PerlString        # same as new P0, .PerlString

P0 = "Hello, Polly.\n"

print P0

end

Here, a PerlString object is created with the new CLASSNAME syntax[4] and stored in the PMC register P0.

[4] Unlike PASM, IMCC doesn't use a dot in front of the class name.

It gets assigned the string value "Hello, Polly.\n" and then printed. The syntax is exactly the same with temporary register variables:

$P4711 = new PerlString

$P4711 = "Hello, Polly.\n"

print $P4711

end

With named variables the Parrot class has to be specified both as the type for the .local statement and as the class name for the new:

.local PerlString hello

hello = new PerlString

hello = "Hello, Polly.\n"

print hello

end

Another important instruction for working with Parrot classes is clone. A simple assignment of a Parrot class only creates an alias:

.local PerlString hello

hello = new PerlString

hello = "Hello, "

$P0 = hello               # PASM: set P0, P1

$P0 = "Polly.\n"

hello = hello . $P0

print hello

end

This prints:

Polly.

Polly.

In this example, $P0 and hello are really the same string. When you assign to one, you've assigned to both. To get a true copy, you have to use $P0 = clone hello instead of $P0 = hello, as follows:

.local PerlString hello

hello = new PerlString

hello = "Hello, "

$P0 = clone hello        # PASM: clone P0, P1

$P0 = "Polly.\n"

hello = hello . $P0

print hello

end

This prints:

Hello, Polly.
7.2.3.5 Named constants

Named constants are declared with a .const statement. It's very similar to .local, and requires a type and a name. The value must be assigned in the declaration statement:

.const string hello = "Hello, Polly.\n"

print hello

end

This example declares a named string constant hello and prints the value. Named constants can be used in all the same places as literal constants, but have to be declared beforehand:

.const int the_answer = 42        # integer constant

.const string mouse = "Mouse"     # string constant

.const float pi = 3.14159         # floating point constant
7.2.3.6 Register spilling

As we mentioned earlier, IMCC allocates Parrot registers for all temporary register variables and named variables. When IMCC runs out of registers to allocate, some of the variables have to be stored elsewhere. This is known as "spilling." IMCC spills the variables with the lowest score. It stores the spilled variable in a PerlArray object while it isn't used, then restores it to a register the next time it's needed:

set $I1, 1

set $I2, 2

...

set $I33, 33

...

print $I1

print $I2

...

print $I33

If you create 33 integer variables like this—all containing values that are used later—IMCC allocates the available integer registers to variables with a higher score and spills the variables with a lower score. In this example it picks $I1 and $I2. Behind the scenes, IMCC generates code to store the values:

new P31, .PerlArray

...

set I0, 1           # I0 allocated to $I1

set P31[0], I0      # spill $I1

set I0, 2           # I0 reallocated to $I2

set P31[1], I0      # spill $I2

It creates a PerlArray object and stores it in register P31.[5]

[5] P31 is reserved for register spilling in PIR code, so generally it shouldn't be accessed directly.

The set instruction is the last time $I1 is used for a while, so immediately after that, IMCC stores its value in the spill array and frees up I0 to be reallocated.

Just before $I1 and $I2 are accessed to be printed, IMCC generates code to fetch the values from the spill array:

...

set I0, P31[0]       # fetch $I1

print I0

7.2.4 Symbol Operators

You probably noticed the = assignment operator in some of the earlier examples:

$S2000 = "Hello, Polly.\n"

print $S2000

end

Standing alone, it's the same as the PASM set opcode. In fact, if you run imcc in bytecode debugging mode (as in Section 7.2.3.2), you'll see it really is just a set opcode underneath.

PIR has many other symbol operators: arithmetic, concatenation, comparison, bitwise, and logical. Many of these combine with assignment to produce the equivalent of a PASM opcode:

.local int sum

sum = $I42 + 5

print sum

print "\n"

end

The statement sum = $I42 + 5 translates to add I0, I1, 5.

A complete list of operators is available in Section 7.6. We'll discuss the comparison operators in Section 7.3.

7.2.5 Labels

A label names a line of code so other instructions can refer to it. Label names have to be valid PIR identifiers, just like named variables, so they're made of letters, numbers, and underscores. Simple labels are often all caps to make them stand out more clearly. A label definition is simply the name of the label followed by a colon. It can be on its own line:

LABEL:

    print "Norwegian Blue\n"

or before a statement on the same line:

LABEL: print "Norwegian Blue\n"

IMCC has both local and global labels. Global labels start with an underscore. The name of a global label has to be unique, since it can be called at any point in the program. Local labels start with a letter. A local label is accessible only in the compilation unit where it's defined.[6]

[6] We'll discuss compilation units in the next section.

The name has to be unique there, but it can be reused in a different compilation unit.

branch L1   # local label

bsr    _L2  # global label

Labels are most often used in branching instructions and in calculating addresses for jumps.

7.2.6 Compilation Units

Compilation units in PIR are roughly equivalent to the subroutines or methods of a high-level language. They start with the .sub directive and end with the .end directive:

.sub _main

    print "Hello, Polly.\n"

    end

.end

This example defines a compilation unit named _main that prints a string. The name is actually a global label for this piece of code. If you generate a PASM file from the PIR code (see Section 7.2.3.2), you'll see that the name translates to an ordinary label:

_main:

        print "Hello, Polly.\n"

        end

The compilation units in a file and the code outside of compilation units are parsed and processed all at once. IMCC emits each compilation unit to bytecode or PASM code as a unit when it reaches the .end directive.

The first compilation unit in a file is special. The convention is to call it _main, but the name isn't critical. Since it's emitted first, it's always executed first. This means that when it closes with an end, nothing else in the file will ever execute unless it's called from within _main.

Any statements outside a compilation unit are emitted after all the compilation units. Generally this means such code is skipped:

print "Polly want a cracker?\n"



.sub _main

    print "Hello, Polly.\n"

    end

.end

This code prints out "Hello, Polly." but not "Polly want a cracker?" because end halts the interpreter, so it never reaches the statement outside the compilation unit.

Directives to IMCC (which start with a ".") aren't delayed like other statements. So, if you declare a named variable or named constant outside a compilation unit, it will be available to any statements that follow it:

.local string hello

hello = "Polly want a cracker?\n"

print hello



.sub _main

    hello = "Hello, Polly.\n"

    print hello

    end

.end

In the first line of this example, the .local directive defines a file global variable named hello. The _main routine uses the same variable, and would give you a parse error if it hadn't been defined. "Polly want a cracker?" is never assigned to the variable and printed.

Pure PASM compilation units can use the .emit and .eom directives instead of .sub and .end:

.emit

    print "Hello, Polly.\n"

    end

.eom

The .emit directive doesn't take a name.

The section coming up on Section 7.4 goes into much more detail about compilation units and their uses.

7.2.7 Scope and Namespaces

The .namespace directive creates a scoped namespace for variables. Variables from outside the namespace are visible in the inner scope unless that scope has a local variable with the same name:

.sub _scoped_hello

    .local PerlString hello

    hello = new PerlString

    hello = "Welcome, Python!\n"

    .namespace inner

    .local PerlString hello

    hello = new PerlString

    hello = "Hello, Perl 6.\n"

    print hello

    .endnamespace inner

    print hello

    end

.end

This example prints:

Hello, Perl 6.

Welcome, Python!

The first .local directive defines a named variable hello in the default outer namespace. The second .local defines a named variable in the inner namespace. Internally, it actually mangles the name of the variable as inner::hello. The first print is nested in the inner namespace, so it prints inner::hello, "Hello, Perl 6." The second print statement retrieves the hello variable of the outer namespace, so it prints "Welcome, Python!"

Constants are collected for the whole program so they can be efficiently folded. Identical string or number constants in different compilation units get a single entry in the constant table.

    Team LiB   Previous Section   Next Section