eMIPS, A Dynamically Extensible Processor

1.1 Usage

Different users will want to perform one or more of the following tasks: re-building the full system from sources, install a system on a fresh disk image, either on Giano or on a board, building and loading the scheduler, and adding an accelerator to an application program. The following sections describe these tasks in details.

1.2 Building

The system is built as a cross-compilation from a NetBSD system. When using Windows as the development system, the first step is to install the VirtualPC product and create a virtual machine (and disk) for the host NetBSD system. We recommend at least 12 GB of disk space. The installation CD (ISO) images to create the host NetBSD system can be found at ftp://iso2.us.NetBSD.org/pub/NetBSD/iso/<version>/ i386cd-<version>.iso, we used version 4.0.1 (and 5.99.39) without problems, but at least one other version was unable of building the system, presumably due to problems with the ACPI BIOS of the VPC. When appropriate, in the following we will refer to version 4.0.1 and related procedures.

The sources can be found on the distribution CD, and someday on the official distribution places, such as ftp://iso2.us.NetBSD.org/pub/NetBSD/NetBSD-<version>/source/sets/. The following files should be installed on the host NetBSD system:

- gnusrc.tgz

- sharesrc.tgz

- src.tgz

- syssrc.tgz

- xsrc.tgz

We created the directories /usr/src, /usr/xsrc, and /usr/obj to hold sources and binaries. We made them owned by a regular user. One way to unpack the files is “for file in *.tgz; do tar -xzf $file -C /; done”.

The build procedure consists of three steps: building the system, the X system, and the distribution CD images. The following commands, executed in /usr/src, will perform these three steps:

- ./build.sh -u -U -V MKSOFTFLOAT=yes -m emips release

- ./build.sh -x -u -U -V MKSOFTFLOAT=yes -m emips release

- ./build.sh -u -U -V MKSOFTFLOAT=yes -m emips iso-image-source

On a current high-end portable PC these steps take approximately 14 hours, 9 hours, and a few minutes respectively. The final result is the bootable CD image /usr/obj/releasedir/iso/emipscd.iso, approximately 330 MB large. The directory /usr/obj/releasedir can be used for installation over the network. The directory /usr/obj/destdir.emips, after some manipulation, can be used as the root to mount via NFS (use for testing only).

1.3 Installation on Giano

The first step is to install the Giano simulator itself. The installation MSI file for Giano can be found at http://research.microsoft.com/en-us/downloads, look for the current Giano release page there. The second step is to create a directory to hold all the simulated system data, for instance c:\Giano\tests\emips. At minimum, this directory should contain the following files:

- boot.emips

- emips3.img (*)

- emipscd.iso

- Ml40x_2ace.plx

- Ml40x_bram.bin

- putty.exe (*)

All files except the starred ones can be found in the Giano installation directory. The file boot.emips is the NetBSD bootloader; it is built as part of building the NetBSD system for eMIPS (Section 8.1). A copy should also be on the root of the installation CD emipscd.iso. The file Ml40x_bram.bin is the primary eMIPS bootloader; it can be found and rebuilt from the official eMIPS hardware distribution. The file Ml40x_2ace.plx is the Giano platform configuration file that defines the Xilinx boards. This file is valid both for the ML40x and the XUP boards. The configuration adds a second SystemACE controller, useful as CD drive. The file putty.exe is the free terminal simulator PuTTY; it can be downloaded from http://www.chiark.greenend.org.uk/~sgtatham/putty. We use Release 0.60. The file emips3.img is the primary disk image for the simulated system. It should be created as empty, with the desired size, by the user. One way is to concatenate a few large files, making sure the first one is not a valid NetBSD disk image (to avoid confusion later on). For instance, you might type the command “copy/bin boot.emips+emipscd.iso+emipscd.iso emips3.img”, which will create an image of approximately 700 MB. We recommend at the very least 800 MB of disk space for the system, and preferably 2 GB or more.

Assuming you have a cmd window open in the directory c:\Giano\tests\emips (or a Visual Studio Command Prompt window), start PuTTY by typing:

- putty.exe

and then get it ready to connect to Giano. In the configuration panel, select a Serial connection and set the Serial line name to \\.\pipe\usart0. It is convenient to Save this configuration for quick reuse later.

Next start the simulator with the following command line:

- giano -Platform Ml40x_2ace.plx

The simulator will initialize and pop up a warning for “Access to a non-existent memory”. This is expected, software is probing I/O space to auto-configure the system. This is a good stop point that allows you to tell PuTTY to connect to the simulator. To do so, select Open in the configuration pane. Next go back to the warning window and select Retry. You should see a message on the PuTTY window from the bootloader saying “Hit any char to boot...”. Do so, and you should get the following prompt:

NetBSD/emips <version> …

Default: 0/ace(0,0)/netbsd

boot: 0/ace(1,0)/netbsd

Answer as indicated in bold red letters, electing to boot from the installation CD. Later on you can let the system boot from the default choice. This is also selected via a timeout if you do not type anything.

This process brings up the standard NetBSD sysinst application. Refer to the official documentation for the various options, and for the post-install procedures: see http://www.NetBSD.org/docs/guide/en/index.html. The installation CD contains documentation on the installation process also, starting with the file emips/INSTALL.html.

1.4 Installation on Boards

The installation on a Xilinx ML40x or XUP board assumes that you first create a bootable compact flash card on a PC, then download the bootloader, and finally install the bootloader into flash for operational use.

A quick way to perform the first step on NetBSD is to use dd(1) to copy the installation CD onto the compact flash card. Under Windows you can do the same using the utility copydd.exe from the Giano distribution. An alternate and often useful way is to tell Giano to use the compact flash drive as the primary disk drive. Assume the compact flash drive shows up as “Disk 1” under Disk Management. Disable all drive letters from the drive, and run the following as administrator:

- giano -Platform Ml40x_2ace.plx SystemAce::HostFile \\.\PhysicalDrive1

Then follow the instructions of Section 8.2 to create your bootable compact flash card.

The first time NetBSD is installed on a board you will need to download the NetBSD bootloader into RAM using the eMIPS serial line boot option. Use the download.exe utility from the eMIPS distribution for this. Assume the serial line connection to the board is on Com1. Set the dip switches for a serial line download, program the FPGA and then type the following:

- download com1: boot.emips

Once the download is complete, start PuTTY and tell it to make a Serial connection to Serial line COM1:, with Speed 38400. You will have missed the “Hit any char to boot...” prompt during the switchover, so type one char to get to the boot prompt above. Use the default boot device.

Once the system is up and running, login as root and install the bootloader into flash:

- dd if=boot.emips of=/dev/rflash0c bs=4k conv=sync

Note that if you used sysinst directly on the board it will ask you if you want it to perform this step for you, you do not need to repeat it.

Finally, change the dip switches to boot from flash (switch number zero should be set to one) and reboot. The system is now operational and should be able to boot directly from disk, without doing the download again. Note that the bootloader is also able to boot remotely via DHCP/BOOTP. Once in flash, it can be used to prime new cards directly from the net, or to boot diskless.

The BEE3 machine does not have any permanent memories, therefore on this system the bootloader must be re-downloaded on each power-up. Since there are no disks either, the only option is to run NetBSD diskless over NFS. The corresponding procedures are well known, and beyond the scope of this document.

1.5 Scheduler LKM

The scheduler is built as part of the full system build procedure (see Section 8.1), which produces two LKMs: syscall_accel.o and syscall_accel_data.o. The build places these two files in the corresponding object directories. The first is the scheduler proper; the second is a debug/maintenance interface to the scheduler. Should you need to rebuild the scheduler, in the 4.x tree, go to the source directory /usr/src/sys/lkm/syscall/accel and cross-recompile. In a more recent version of NetBSD the location has moved to /usr/src/sys/modules.

During installation, these two files are placed in /usr/lkm. The scheduler is loaded like all LKMs using modload(1), refer to its man page for details. The following commands will do the loading:

- modload /usr/lkm/syscall_accel.o

- modload /usr/lkm/syscall_accel_data.o

Modstat(1) will verify that the LKMs are loaded properly. Look at /etc/lkm.conf(5) to see how to enable these modules automatically. Note that syscall_accel.o is normally compiled with the option LOCK_THE_ICAP enabled. This creates a potential locking conflict with dev_mkdb(1) that can hang the system during boot. This can be solved using the AFTERMOUNT condition in the /etc/lkm.conf entry as follows:

syscall_accel.o - - - - AFTERMOUNT

A similar locking problem arises during shutdown, because LKMs are not unloaded by default by the system. To fix this, edit the file /etc/rc.d/lkm3 to add this line:

# KEYWORD: shutdown

Failing that… you will have to halt the system manually:

shutdown now

modunload accel

halt

Two simple test programs are also built: test_accel and test_accel_data. The first manually loads an accelerator, for testing purposes. The second displays the list of all accelerators. These two programs live with the corresponding LKMs but are not normally installed in a user system. Administrators should use these example programs to build more advanced facilities instead.

1.6 Applications

This section describes how to manually add hardware acceleration to a program, using the simple test program of Appendix B as reference. The procedure is purely illustrative; other tools are normally used to automatically generate an accelerated program from an existing optimized binary program. The Giano simulator can profile and identify the blocks to accelerate; the bbtools can patch a binary and insert the extended opcodes; the M2V compiler [27] can generate the hardware accelerator from the MIPS binary code. Here we do everything manually, but for brevity we omit the creation of the hardware accelerator itself (see the manual from the eMIPS hardware distribution).

The C test program is quite simple, it reports the elapsed time taken to invoke the external function loop(), passing the argument count to it, and repeating the invocation nloops times. The external function itself, written in assembler, is also quite simple. The first instruction is the extended opcode to invoke the accelerator. The basic block that follows the extended opcode loops decrementing the integer argument in register a0, until this become zero or negative.

The idea behind this example is to create the smallest possible program that demonstrates a measurable difference between use and no-use of the accelerator. One simple implementation of the hardware accelerator, shown in Appendix B, can simply transfer control to register ra. The accelerator will take just one cycle to execute, the software version will execute 2+(nloops*3) instructions instead, which will take considerable longer since eMIPS does not currently have an instruction cache.

The following command creates the optimized software binary, assuming that the C code is in the file tloop.c and the assembler file is in _tloop.S:

- cc –O2 –o tloop tloop.c _tloop.S

The program should be run first in this un-accelerated form, to verify that it works as expected. Since the tloop program image does not include or reference any accelerator, it will always run in software only.

The accelerator code is shown in Appendix B. This code is added to the standard extension boilerplate code to create a PR project, using the Xilinx ISE tools. Let us assume now that the hardware accelerator file was generated and the Xilinx tools produced the corresponding FPGA partial configuration file tloop.bit. In our setup the file is 109,604 [AF1] bytes long. To add the accelerator to our program image we use the ace2se utility:

- ace2se –ph1 tloop_a tloop tloop.bit 0 2000

The argument -ph1 indicates that the hardware properties should be set to 0x1. Setting bit zero of this field indicates that the accelerator plans to use opcode 24 for acceleration, which is the first of the available extended opcodes. The flags argument is 0; the accelerator does not require any special treatment. The savings argument of 2000 cycles per invocation is arbitrary since the accelerator actually provides a variable amount of speedup.

To verify that the new file tloop_a is indeed an accelerated application in the SE file format we can invoke the sedump utility:

- sedump tloop_a

An interesting line in the output is:

Hardware Image Properties: x1 op24

This verifies that our accelerator, if loaded, will be enabled for extended opcode 24.

Running the accelerated image will demonstrate an appreciable speedup over the un-accelerated version.