The 512-byte VM

by

What’s the smallest virtual machine you could create? Or, why would you want one? You can grab the source or the binaries on GitHub.

This post was originally published on the oVirt blog.

In the days of ubiquitous bandwidth and fast computers we often don’t care if a VM or container image is a gig, or two, in size. However, this rapidly changes when you are confronted with edge cases: people living in less well-connected countries, or, in our case, needing to run hundreds or even thousands of virtual machines for automated test cases over a VPN.

Recently, we started working on the oVirt Terraform Provider, which has unfortunately seen better days. Apart from getting our own very much needed changes we, at the time of writing, have a list of pull requests that have been open for a long time.

Maintaining a project is easy if one works on it alone, or just a few people in close coordination. One can make decisions, execute them, and get a reasonable quality despite the lack of tests. However, when more people get involved, or a project lives on for a long time, automated tests are essential to keep the quality up and avoid breaking things.

If one doesn’t have tests the development speed inevitably slows down to a crawl as time passes and new features get bolted on top. Especially when it’s a community project as the motivation dwindles to review PRs. (Not to mention the fact that the bug reports piling up cost time and effort to fix.)

This is why we needed a solution to run automated tests quickly. The previous test suite was hard-coded for specific identifiers, such as cluster IDs, template IDs, etc. This made it impossible for contributors to run these tests, and incredibly hard to build a continuous integration system to verify patches.

One of the problems we encountered while developing the new test suite was the question of the oVirt template: we had no readily available template we could rely on to be present in any random oVirt cluster. (We even had reports where someone removed the blank template!)

So, we needed a virtual machine image. What to do, what to do? Build a miniature version of CentOS? Package Alpine Linux in a VM image?

No. It’s not that these are not all valid solutions, but they would require the test suite to download external dependencies and then upload a not-so-small file to the oVirt cluster used for testing.

Instead, an old memory came up from the school days: a little utility written by the system administrators which would inject itself into the boot sector, written entirely in Assembler… Assembler! That was it! One could just write a very simple “Hello World” program! This could fit in the boot sector and would be the smallest virtual machine image possible: 512 bytes in size. Small enough to just commit the binary into the Git repo.

Would it work? We had no idea. So, we had to go learn Assembler and dust off a fair bit of long-forgotten knowledge about how computers boot. In the old days we would have had to use Altavista or go through 600-page books to learn what we needed to, but thanks to Google and GitHub we found several examples of people doing just that.

At this point, we could have just taken an existing example under a permissive license and call it a day. But, what would we have learned then? Or more importantly, if we just ran someone else’s code without understanding it, how would we know that it actually does what we need it to do, 100% of the time? You can’t really hide anything super malicious in 512 bytes, but the VM may not boot at random times or shut down, which would lead to flaky tests; the last thing we wanted for a project with constrained resources.

So, down we go into the rabbit hole. Let’s learn some Assembler! The first thing we (re-)learned was the fact that there is no one Assembler language. Different variants have their own syntax. After a bit of digging we settled on the Netwide Assembler.

Let’s start our little program:

ORG 0x7C00

The first instruction tells NASM which memory address the program will start at. 0x7C00 is the address where the BIOS loads the program in the boot sector. Next, we will need to tell NASM to assume our program is running in 16 bit mode. (The CPU is in this mode when the BIOS runs and switches to 32 or 64 bit mode later on in the boot process.)

BITS 16

Let’s compile our program and run it with QEMU:

nasm ourprogram.asm
qemu-system-x86_64 \
    -nographic \
    -serial mon:stdio \
    -drive file=ourprogram.raw,format=raw \
    -monitor telnet::2000,server,nowait

The output isn’t terribly surprising, since we haven’t written any instructions yet:

Booting from Hard Disk...
Boot failed: could not read the boot disk

The BIOS didn’t detect a valid boot sector since the last two bytes of the boot sector must be 0xAA55. Let’s fix that by filling up the disk image with zero bytes (510 bytes) and then adding the magic bytes for the BIOS:

TIMES 510 - ($ - $$) DB 0 ; Fill up 510 bytes
DW 0xAA55 ; Write magic bytes for boot loader

This changes the output, now the boot sector is actually loaded but doesn’t do anything yet:

Booting from Hard Disk...

Fantastic! The last bit that we need is writing our “Hello World” text to the screen. Thankfully, we don’t need to write graphics drivers or anything of the sort. We can either use the INT 10 BIOS function or write directly to the 0xB8000 memory address.

Before we begin, let’s talk about CPU architecture in very broad strokes. (CPU architects please don’t read this.) RAM is fast, but not quite fast enough. In order for the CPU to work with data we need to load it into the so-called registers. These are comparatively tiny pieces of very fast memory built directly onto the CPU chip. The CPU then uses the data from the registers to perform operations on the data.

Most of the heavy lifting in our program will be done by the BIOS, so the only thing we need to do is to load a byte into the AL register (the lower 8 bits of the accumulator register), set the AH register value to 0x0E to print the character, and then call the interrupt 10 to trigger the BIOS to print the character.

Let’s do that. As a first step, let’s create a label that we can reference with the text we want to print, before the end of our program:

ORG 0x7C00 ; Starting address of the boot loader
BITS 16 ; Start program in 16 bit mode

text:
    DB "Hello oVirt!", 0 ; Embed data into binary, zero-terminated.

TIMES 510 - ($ - $$) DB 0 ; Fill up 510 bytes
DW 0xAA55 ; Write magic bytes for boot loader

Fantastic, now the text will be added into our binary. (Note, that running the program now would make the CPU interpret our text as CPU instructions, which is not what we want.)

As a next step, let’s add an instruction to load the memory address of this text label into the SI register. (SI is the register for string operations.) This is done with the MOV SI, text instruction:

ORG 0x7C00
BITS 16

; Move the address of the text label into the SI register
MOV SI, text

text:
    DB "Hello oVirt!", 0

TIMES 510 - ($ - $$) DB 0
DW 0xAA55

Next up, we need to tell the BIOS that we want to print a character. This is done by putting the byte 0x0E into the AH register. (The upper 8 bytes of the accumulator.) This is done with the MOV AH, 0x0E command:

ORG 0x7C00
BITS 16

MOV SI, text
; Tell the BIOS that we want to print a character
MOV AH, 0x0E

text:
    DB "Hello oVirt!", 0

TIMES 510 - ($ - $$) DB 0
DW 0xAA55

Finally, we need a loop that always loads the next byte from the address in the SI register into AL, increases the register value by one, and then calls INT 10. If the value of AL is 0, the program should exit. This is done as follows:

.printChar:

     ; Load byte from the address in SI into AL and advance SI by one
    LODSB

    ; Check if AL is 0.
    ; This can also be written as OR AL, AL, which saves one byte in the disk image.
    CMP AL, 0

    ; If yes, jump to the return
    JE .stop

    ; Trigger BIOS print method
    INT 0x10

    ; Repeat for next byte
    JMP .printChar

.stop:
    ; Stop the CPU
    HLT

Put together, this is our entire program:

; Tell the assembler the starting address
ORG 0x7C00
; Tell the assembler we are running in 16 bit mode
BITS 16

; Move the address of the text label below to the SI register
MOV SI, text

; Tell the BIOS that we want to print a character
MOV AH, 0x0E

.printChar:

     ; Load byte from the address in SI into AL and advance SI by one
    LODSB

    ; Check if AL is 0.
    ; This can also be written as OR AL, AL, which saves one byte in the disk image.
    CMP AL, 0

    ; If yes, jump to the return
    JE .stop

    ; Trigger BIOS print method
    INT 0x10

    ; Repeat for next byte
    JMP .printChar

.stop:
    ; Stop the CPU
    HLT

text:
    ; Embed this text into the binary, terminated with a 0
    DB "Hello oVirt!", 0

; Fill up the binary to 510 bytes with zeroes
TIMES 510 - ($ - $$) DB 0
; Write the boot sector magic.
DW 0xAA55

If we now run this program with QEMU we’ll see the following:

Booting from Hard Disk...
Hello oVirt!

Fantastic! We have our 512 byte VM image. Of course, this project has evolved significantly and has received contributions on GitHub, including a CI/CD system, test, and a proper readme, but these are the very baby steps we took to create our test image. Enjoy!

This post was originally published on the oVirt blog.