6.2 Basics

PASM has a simple syntax. Each statement stands on its own line. Statements begin with a Parrot instruction code (commonly referred to as an "opcode"). The arguments follow, separated by commas:

[label] opcode dest, source, source ...

If the opcode returns a result, it is stored in the first argument. Sometimes the first register is both a source value and the destination of the operation. The arguments are either registers or constants, though only source arguments can be constants:

LABEL:

    print "The answer is: "

    print 42

    print "\n"

    end                # halt the interpreter

Comments are marked with the hash sign (#) and continue to the end of the line. Any line can start with a label definition like LABEL:, but label definitions can also stand on their own line.

6.2.1 Constants

Integer constants are signed integers.^[4]

^[4] The size of integers is defined when Parrot is configured. It's typically 32 bits on 32-bit machines (a range of -2³¹ to +2^31-1) and twice that size on 64-bit processors.

Integer constants can have a positive (+) or negative (-) sign in front. Binary integers are preceded by 0b or 0B, and hexadecimal integers are preceded by 0x or 0X:

print 42          # integer constant

print -0b101      # binary integer constant with sign

print 0Xdeadbeef  # hex integer constant

Floating-point constants can also be positive or negative. Scientific notation provides an exponent, marked with e or E (the sign of the exponent is optional):

print 3.14159    # floating point constant

print -1.23e+45  # in scientific notation

String constants are wrapped in single or double quotation marks. Quotation marks and other nonprintable characters inside the string have to be escaped by a backslash. The escape sequences for special characters are the same as for Perl 5's qq( ) operator.

print "string\n"    # string constant with escaped newline

print 'that\'s it'  # escaped single quote

print "\\"          # a literal backslash

6.2.2 Working with Registers

Parrot is a register-based virtual machine. It has 4 typed register sets with 32 registers in each set. The types are integers, floating-point numbers, strings, and Parrot objects. Register names consist of a capital letter indicating the register set and the number of the register, between 0 and 31. For example:

I0   integer register #0

N11  number or floating point register #11

S2   string register #2

P31  PMC register #31

Integer and number registers hold values, while string and PMC registers contain pointers to allocated memory for a string header or a Parrot object.

The length of strings is limited only by your system's virtual memory and by the size of integers on the particular platform. Parrot can work with strings of different character types and encodings. It automatically converts string operands with mixed characteristics to Unicode.^[5]

^[5] This conversion isn't fully implemented yet.

Parrot Magic Cookies (PMCs) are Parrot's low-level objects. They can represent data of any arbitrary type. The operations (methods) for each PMC class are defined in a fixed vtable, which is a structure containing function pointers that implement each operation.

6.2.2.1 Register assignment

The most basic operation on registers is assignment using the set opcode:

set I0, 42        # set integer register #0 to the integer value 42

set N3, 3.14159   # set number register #3 to the value of 

set I1, I0        # set register I1 to what I0 contains

set I2, N3        # cast the floating point number to an integer

PASM uses registers where a high-level language would use variables. The exchange opcode swaps the contents of two registers of the same type:

exchange I1, I0   # set register I1 to what I0 contains

                  # and set register I0 to what I1 contains

As we mentioned before, string and PMC registers are slightly different because they hold a pointer instead of directly holding a value. Assigning one string register to another:

set S0, "Ford"

set S1, S0

set S0, "Zaphod"

print S1                # prints "Ford"

end

doesn't make a copy of the string; it makes a copy of the pointer. Just after set S1, S0, both S0 and S1 point to the same string. But assigning a constant string to a string register allocates a new string. When "Zaphod" is assigned to S0, the pointer changes to point to the location of the new string, leaving the old string untouched. So strings act like simple values on the user level, even though they're implemented as pointers.

Unlike strings, assignment to a PMC doesn't automatically create a new object; it only calls the PMC's vtable method for assignment. So, rewriting the same example using a PMC has a completely different result:

new P0, .PerlString

set P0, "Ford"

set P1, P0

set P0, "Zaphod"

print P1                # prints "Zaphod"

end

The new opcode creates an instance of the .PerlString class. The class's vtable methods define how the PMC in P0 operates. The first set statement calls P0's vtable method set_string_native, which assigns the string "Ford" to the PMC. When P0 is assigned to P1:

set P1, P0

it copies the pointer, so P1 are P0 both aliases to the same PMC. Then, assigning the string "Zaphod" to P0 changes the underlying PMC, so printing P1 or P0 prints "Zaphod".^[6]

^[6] Contrast this with assign in Section 6.3.2 later in this chapter.

6.2.2.2 PMC object types

Internally, PMC types are represented by positive integers, and built-in types by negative integers. PASM provides two opcodes to deal with types. typeof returns the name corresponding to a numeric type or the type of a PMC. find_type takes a type name and returns the integer value that represents that type.

When the source argument is a PMC and the destination is a string register, typeof returns the name of the type:

new P0, .PerlString

typeof S0, P0               # S0 is "PerlString"

print S0

print "\n"

end

In this example, typeof returns the type name "PerlString".

When the source argument is a PMC and the destination is an integer register, typeof returns the integer representation of the type:

new P0, .PerlString

typeof I0, P0               # I0 is 17

print I0

print "\n"

end

This example returns the integer representation of PerlString, which is 17.

When typeof's source argument is an integer, it returns the name of the type represented by that integer:

set I1, -100

typeof S0, I1               # S0 is "INTVAL"

print S0

print "\n"

end

The integer representation of a built-in integer value is -100, so it returns the type name "INTVAL".

The source argument to find_type is always a string containing a type name, and the destination register is always an integer. It returns the integer representation of the type with that name:

find_type I1, "PerlString"  # I1 is 17

print I1

print "\n"

find_type I2, "INTVAL"      # I2 is -100

print I2

print "\n"

end

Here, the name "PerlString" returns 17, and the name "INTVAL" returns -100.

All Parrot classes inherit from the class default, which has the type number 0. The default class provides some default functionality, but mainly throws exceptions when the default variant of a method is called (meaning the subclass didn't define the method). Type number 0 returns the type name "illegal", since no object should ever be created from the default class:

find_type I1, "fancy_super_long_double" # I1 is 0

print I1

print "\n"

typeof S0, I1                           # S0 is "illegal"

print S0

print "\n"

end

The type numbers are not fixed values. They change whenever a new class is added to Parrot or when the class hierarchy is altered. A header file containing an enumeration of PMC types (include/parrot/core_pmcs.h) is generated during the configuration of the Parrot source tree. The PMC types take their numbers from their order in this file. Internal data types and their names are specified in include/parrot/datatypes.h.

You can generate a complete and current list of valid PMC types by running this command within the main Parrot source directory:

$ perl classes/pmc2c.pl --tree classes/*.pmc

which produces output like:

Array

Boolean

    PerlInt

        perlscalar

            scalar

Compiler

    NCI

...

The output traces the class hierarchy for each class: Boolean inherits from PerlInt, which is derived from the abstract perlscalar and scalar classes (abstract classes are listed in lowercase). The actual classnames and their hierarchy may have changed by the time you read this.

6.2.2.3 Type morphing

The classes PerlUndef, PerlInt, PerlNum, and PerlString implement Perl's polymorphic scalar behavior. Assigning a string to a number PMC morphs it into a string PMC. Assigning an integer value morphs it to a PerlInt, and assigning undef morphs it to PerlUndef:

new P0, .PerlString

set P0, "Ford\n"

print P0           # prints "Ford\n"

set P0, 42

print P0           # prints 42

print "\n"

typeof S0, P0

print S0           # prints "PerlInt"

print "\n"

end

P0 is created as a PerlString, but when an integer value 42 is assigned to it, it changes to type PerlInt.

6.2.3 Math Operations

PASM has a full set of math instructions. These work with integers, floating-point numbers, and PMCs that implement the vtable methods of a numeric object. Most of the major math opcodes have two- and three-argument forms:

add I0, I1              # I0 += I1

add I10, I11, I2        # I10 = I11 + I2

The three-argument form of add adds the last two numbers and stores the result in the first register. The two-argument form adds the first register to the second and stores the result back in the first register.

The source arguments can be Parrot registers or constants, but they must be compatible with the type of the destination register. Generally, "compatible" means that the source and destination have to be the same type, but there are a few exceptions:

sub I0, I1, 2          # I0 = I1 - 2

sub N0, N1, 1.5        # N0 = N1 - 1.5

If the destination register is an integer register, like I0, the other arguments must be integer registers or integer constants. A floating-point destination, like N0, usually requires floating-point arguments, but many math opcodes also allow the final argument to be an integer. A PMC destination can have an integer or floating-point argument as the last one:

mul P0, P1             # P0 *= P1

mul P0, I1

mul P0, N1

mul P0, P1, P2         # P0 = P1 * P2

mul P0, P1, I2

mul P0, P1, N2

Operations on a PMC are implemented by the vtable method of the destination (in the two-argument form) or the left source argument (in the three argument form). The result of an operation is entirely determined by the PMC. A class implementing imaginary number operations might return an imaginary number, for example.

We won't list every math opcode here, but we'll list some of the most common ones. You can get a complete list in Section 6.9 later in this chapter.

6.2.3.1 Unary math opcodes

The unary opcodes have a single source argument and a single destination argument. Some of the most common unary math opcodes are inc (increment), dec (decrement), abs (absolute value), neg (negate), and fact (factorial):

abs N0, -5.0  # the absolute value of -5.0 is 5.0

fact I1, 5    # the factorial of 5 is 120

inc I1        # 120 incremented by 1 is 121

6.2.3.2 Binary math opcodes

Binary opcodes have two source arguments and a destination argument. As we mentioned before, most binary math opcodes have a two-argument form in which the first argument is both a source and the destination. Parrot provides add (addition), sub (subtraction), mul (multiplication), div (division), and pow (exponent) opcodes, as well as two different modulus operations. mod is Parrot's implementation of modulus, and cmod is the % operator from the C library. It also provides gcd (greatest common divisor) and lcm (least common multiple).

div I0, 12, 5   # I0 = 12 / 5

mod I0, 12, 5   # I0 = 12 % 5

6.2.3.3 Floating-point operations

Although most of the math operations work with both floating-point numbers and integers, a few require floating-point destination registers. Among these are ln (natural log), log2 (log base 2), log10 (log base 10), and exp (e^x), as well as a full set of trigonometric opcodes such as sin (sine), cos (cosine), tan (tangent), sec (secant), cosh (hyperbolic cosine), tanh (hyperbolic tangent), sech (hyperbolic secant), asin (arc sine), acos (arc cosine), atan (arc tangent), asec (arc secant), exsec (exsecant), hav (haversine), and vers (versine). All angle arguments for the trigonometric functions are in radians:

sin N1, N0

exp N1, 2

The majority of the floating-point operations have a single source argument and a single destination argument. Even though the destination must be a floating-point register, the source can be either an integer or floating-point number.

The atan opcode also has a three-argument variant that implements C's atan2( ):

atan N0, 1, 1

6.2.4 Working with Strings

The string operations work with string registers and with PMCs that implement a string class.

At the moment, operations on string registers generate new strings in the destination register. There are plans for an optimized set of string functions that modify an existing string in place. These might be implemented by the time you read this.

String operations on PMC registers require all their string arguments to be PMCs.

6.2.4.1 Concatenating strings

Use the concat opcode to concatenate strings. With string register or string constant arguments, concat has both a two-argument and a three-argument form. The first argument is a source and a destination in the two-argument form:

set S0, "ab"

concat S0, "cd"     # S0 has "cd" appended

print S0            # prints "abcd"

print "\n"



concat S1, S0, "xy" # S1 is the string S0 with "xy" appended

print S1            # prints "abcdxy"

print "\n"

end

The first concat concatenates the string "cd" onto the string "ab" in S0. It generates a new string "abcd" and changes S0 to point to the new string. The second concat concatenates "xy" onto the string "abcd" in S0 and stores the new string in S1.

For PMC registers, concat has only a three-argument form with separate registers for source and destination:

new P0, .PerlString

new P1, .PerlString

new P2, .PerlString

set P0, "ab"

set P1, "cd"

concat P2, P0, P1

print P2            # prints abcd

print "\n"

end

Here, concat concatenates the strings in P0 and P1 and stores the result in P2.

6.2.4.2 Repeating strings

The repeat opcode repeats a string a certain number of times:

set S0, "x"

repeat S1, S0, 5  # S1 = S0 x 5

print S1          # prints "xxxxx"

print "\n"

end

In this example, repeat generates a new string with "x" repeated five times and stores a pointer to it in S1.

6.2.4.3 Length of a string

The length opcode returns the length of a string in characters. This won't be the same as the length in bytes for multibyte encoded strings:

set S0, "abcd"

length I0, S0                # the length is 4

print I0

print "\n"

end

Currently, length doesn't have an equivalent for PMC strings, but it probably will be implemented in the future.

6.2.4.4 Substrings

The simplest version of the substr opcode takes four arguments: a destination register, a string, an offset position, and a length. It returns a substring of the original string, starting from the offset position (0 is the first character) and spanning the length:

substr S0, "abcde", 1, 2        # S0 is "bc"

This example extracts a string from "abcde" at a one-character offset from the beginning of the string (the second character) and spanning two characters. It generates a new string, "bc", in the destination register S0.

When the offset position is negative, it counts backward from the end of the string. So an offset of -1 starts at the last character of the string.

substr also has a five-argument form, where the fifth argument is a string to replace the substring. This modifies the second argument and returns the removed substring in the destination register.

set S1, "abcde"

substr S0, S1, 1, 2, "XYZ"

print S0                        # prints "bc"

print "\n"

print S1                        # prints "aXYZde"

print "\n"

end

This replaces the substring "bc" in S1 with the string "XYZ", and returns "bc" in S0.

When the offset position in a replacing substr is one character beyond the original string length, substr appends the replacement string just like the concat opcode.

When you don't need the replaced string, there's an optimized version of substr that just does a replace without returning the removed substring.

set S1, "abcde"

substr S1, 1, 2, "XYZ"

print S1                        # prints "aXYZde"

print "\n"

end

The PMC versions of substr are not yet implemented.

6.2.4.5 Chopping strings

The chopn opcode removes characters from the end of the string. It takes two arguments: the string to modify and the count of characters to remove. For example:

set S0, "abcde"

chopn S0, 2

print S0         # prints "abc"

print "\n"

end

removes two characters from the end of S0. If the count is negative, that many characters are kept in the string:

set S0, "abcde"

chopn S0, -2

print S0         # prints "ab"

print "\n"

end

This keeps the first two characters in S0 and removes the rest. chopn also has a three-argument version that stores the chopped string in a separate destination register, leaving the original string untouched:

set S0, "abcde"

chopn S1, S0, 1

print S1         # prints "abcd"

print "\n"

end

6.2.4.6 Copying strings

The clone opcode makes a deep copy of a string or PMC. Instead of just copying the pointer, as normal assignment would, it recursively copies the string or object underneath.

new P0, .PerlString

set P0, "Ford"

clone P1, P0

set P0, "Zaphod"

print P1        # prints "Ford"

end

This example creates an identical, independent clone of the PMC in P0 and puts a pointer to it in P1. Later changes to P0 have no effect on P1.

With simple strings, the copy created by clone, as well as the results from substr, are copy-on-write (COW). These are rather cheap in terms of memory usage because the new memory location is only created when the copy is assigned a new value. Cloning is rarely needed with ordinary string registers since they always create a new memory location on assignment.

6.2.4.7 Converting characters

The chr opcode takes an integer value and returns the corresponding character as a one-character string, while the ord opcode takes a single character string and returns the corresponding integer:

chr S0, 65                # S0 is "A"

ord I0, S0                # I0 is 65

ord has a three-argument variant that takes a character offset to select a single character from a multicharacter string. The offset must be within the length of the string:

ord I0, "ABC", 2        # I0 is 67

A negative offset counts backward from the end of the string, so -1 is the last character.

ord I0, "ABC", -1        # I0 is 67

6.2.4.8 Formatting strings

The sprintf opcode generates a formatted string from a series of values. It takes three arguments: the destination register, a string specifying the format, and an ordered aggregate PMC (like a PerlArray) containing the values to be formatted. The format string and the destination register can be either strings or PMCs:

sprintf S0, S1, P2

sprintf P0, P1, P2

The format string is similar to the one for C's sprintf function, but with some extensions for Parrot data types. Each format field in the string starts with a % and ends with a character specifying the output format. The output format characters are listed in Table 6-1.

Table 6-1. Format characters

Format

Meaning

%c

A character.

%d

A decimal integer.

%i

A decimal integer.

%u

An unsigned integer.

%o

An octal integer.

%x

A hex integer.

%X

A hex integer with a capital X (when # is specified).

%b

A binary integer.

%B

A binary integer with a capital B (when # is specified).

%p

A pointer address in hex.

%f

A floating-point number.

%e

A floating-point number in scientific notation (displayed with a lowercase "e").

%E

The same as %e, but displayed with an uppercase E.

%g

The same as either %e or %f, whichever fits best.

%G

The same as %g, but displayed with an uppercase E.

%s

A string.

Each format field can be specified with several options: flags, width, precision, and size. The format flags are listed in Table 6-2.

Table 6-2. Format flags

Flag

Meaning

0

Pad with zeros.

<space>

Pad with spaces.

+

Prefix numbers with a sign.

-

Align left.

#

Prefix a leading 0 for octal, 0x for hex, or force a decimal point.

The width is a number defining the minimum width of the output from a field. The precision is the maximum width for strings or integers, and the number of decimal places for floating-point fields. If either width or precision is an asterisk (*), it takes its value from the next argument in the PMC.

The size modifier defines the type of the argument the field takes. The flags are listed in Table 6-3.

Table 6-3. Size flags

Character

Meaning

h

short or float

l

long

H

huge value (long long or long double)

v

INTVAL or FLOATVAL

O

opcode_t

P

PMC

S

string

The values in the aggregate PMC must have a type compatible with the specified size.

Here's a short illustration of string formats:

new P2, .PerlArray

new P1, .PerlNum

new P0, .PerlInt

set P0, 42

set P1, 10

push P2, P0

push P2, P1

sprintf S0, "int %#Px num %+2.3Pf\n", P2

print S0     # prints "int 0x2a num +10.000"

print "\n"

end

The first eight lines create a PerlArray with two elements: a PerlInt and a PerlNum. The format string of the sprintf has two format fields. The first, %#Px, takes a PMC argument from the aggregate (P) and formats it as a hexadecimal integer (x), with a leading 0x (#). The second format field, %+2.3Pf, takes a PMC argument (P) and formats it as a floating-point number (f), with a minimum of two whole digits and a maximum of three decimal places (2.3) and a leading sign (+).

The test files t/op/string.t and t/src/sprintf.t have many more examples of format strings.

6.2.4.9 Testing for substrings

The index opcode searches for a substring within a string. If it finds the substring, it returns the position where the substring was found as a character offset from the beginning of the string. If it fails to find the substring, it returns -1:

index I0, "Beeblebrox", "eb"

print I0                       # prints 2

print "\n"

index I0, "Beeblebrox", "Ford"

print I0                       # prints -1

print "\n"

end

index also has a four-argument version, where the fourth argument defines an offset position for starting the search:

index I0, "Beeblebrox", "eb", 3

print I0                         # prints 5

print "\n"

end

This finds the second "eb" in "Beeblebrox" instead of the first, because the search skips the first three characters in the string.

6.2.5 I/O Operations

The I/O subsystem has at least one set of significant revisions ahead, so you can expect this section to change. It's worth an introduction, though, because the basic set of opcodes is likely to stay the same, even if their arguments and underlying functionality change.

6.2.5.1 Open and close a file

The open opcode opens a file for access. It takes three arguments: a destination register, the name of the file, and a modestring. With a PMC destination, it returns a ParrotIO object on success and a PerlUndef object on failure. With an integer destination, it returns an integer file descriptor on success and -1 on failure:

open P0, "people.txt", "<"

open I0, "people.txt"

The modestring specifies whether the file is opened in read-only (<), write-only (>), read-write (+>), or append mode (>>). open takes a modestring argument only when it's creating a ParrotIO object, and not when it's creating a file descriptor.

The close opcode closes a ParrotIO object or a file descriptor:

close P0        # close a PIO

close I0        # close a descriptor

6.2.5.2 Output operations

We already saw the print opcode in several examples above. The one argument form prints a register or constant to stdout. It also has a two-argument form: the first argument is the file descriptor or ParrotIO object where the value is printed. The standard file descriptors are 0 for stdin, 1 for stdout, and 2 for stderr.

print 2, S0             # print to stderr

printerr S0             # the same

print P0, "xxx"         # print to PIO in P0

write is similar to print, but it only works with integer file descriptors:

write 2, S0             # write string to stderr

6.2.5.3 Reading from files

The read opcode reads a specified number of bytes from either stdin or a ParrotIO object:

read S0, I0             # read from stdin up to I0 bytes into S0

read S0, P0, I0         # read from the PIO in P0

readline is a variant of read that works with file descriptors. It reads a whole line at a time, terminated by the newline character:

readline S0, I0         # read a line from descriptor I0

The seek opcode sets the current file position on a ParrotIO object. It takes four arguments: a destination register, a ParrotIO object, an offset, and a flag specifying the origin point:

seek I0, P0, I1, I2

In this example, the position of P0 is set by an offset (I1) from an origin point (I2). 0 means the offset is from the start of the file, 1 means the offset is from the current position, and 2 means the offset is from the end of the file. The return value (in I0) is 0 when the position is successfully set and -1 when it fails. seek also has a five-argument form that seeks with a 64-bit offset, constructed from two 32-bit arguments.

6.2.6 Logical and Bitwise Operations

The logical opcodes evaluate the truth of their arguments. They're often used to make decisions on control flow. Logical operations are implemented for integers and PMCs.

The and opcode returns the second argument if it's false and the third argument if the second one is true:

and I0, 0, 1  # returns 0

and I0, 1, 2  # returns 2

The or opcode returns the second argument if it's true and the third argument if the second is false:

or I0, 1, 0  # returns 1

or I0, 0, 2  # returns 2



or P0, P1, P2

Both and and or are short-circuiting. If they can determine what value to return from the second argument, they'll never evaluate the third. This is significant only for PMCs, as they might have side effects on evaluation.

The xor opcode returns the second argument if it is the only true value, returns the third argument if it is the only true value, and returns false if both values are true or both are false:

xor I0, 1, 0  # returns 1

xor I0, 0, 1  # returns 1

xor I0, 1, 1  # returns 0

xor I0, 0, 0  # returns 0

The not opcode returns a true value when the second argument is false, and a false value if the second argument is true:

not I0, I1

not P0, P1

The bitwise opcodes operate on their values a single bit at a time. band, bor, and bxor return a value that is the logical AND, OR, or XOR of each bit in the source arguments. They each take a destination register and two source registers. They also have two-argument forms where the destination is also a source. bnot is the logical NOT of each bit in a single source argument.

bnot I0, I1

band P0, P1

bor I0, I1, I2

bxor P0, P1, I2

The logical and arithmetic shift operations shift their values by a specified number of bits:

shl  I0, I1, I2        # shift I1 left by count I2 giving I0

shr  I0, I1, I2        # arithmetic shift right

lsr  P0, P1, P2        # logical shift right