QEMU “-bios” vs. “-kernel” vs. “-device loader,file=…”

QEMU’s command line options for loading code into the guest are various, and often have different semantics between architectures or even between machine types for the same architecture. This is unfortunate but is the result of backwards-compatibility with older QEMU versions and a gradual accumulation of “it would be nice to Do The Right Thing for this image file type” special cases.

Broad summary:

-kernel is the “load a Linux kernel” option. It will load and boot the kernel in whatever way seems best for the architecture being used. For instance, for the x86 PC machine it will just provide the file to the guest BIOS and rely on the guest BIOS to do the actual loading of the file into RAM. On Arm, loading a Linux kernel means that we follow the rules the kernel lays down for how to boot it (https://www.kernel.org/doc/Documentation/arm64/booting.txt for 64-bit or https://www.kernel.org/doc/Documentation/arm/Booting for 32-bit), and we achieve that with a little bit of stub bootloader code (this is what you are seeing in low memory). The kernel boot rules also require that we provide it with a device tree blob in RAM, and this is the data at 0x40000000. We also, in accordance with Linux kernel boot expectations, handle secondary CPUs by either keeping them in PSCI powered-off state or via a little bit of secondary-CPU bootloader code which uses a WFI loop so that the primary can wake them up. (Which we do depends on the board model being used, because we do what the real board does, which especially for 32-bit boards varies a lot.)

As an oddball exception, for Arm if you pass an ELF file to -kernel, we’ll assume that it is not a Linux kernel, and will boot it by just starting at the ELF entrypoint. We provide the DTB blob at the base of RAM, but only if it wouldn’t overlap with the loaded ELF file. (Aside: for ‘virt’ in particular you want the DTB anyway, because we don’t guarantee to keep devices in the same physical addresses between QEMU versions — the DTB is how we tell guest code where it should look for things. You can rely on flash at 0x0 and RAM starting at 0x4000_0000, but really should pull all other device addresses from the DTB. In practice we have made efforts to avoid rearranging the board memory map, but reading the DTB is the right thing for guest code to do.)

-device loader is the “generic loader”, which behaves the same on any architecture. It just loads an ELF image into guest RAM, and doesn’t do anything to change the CPU reset behaviour. This is a good choice if you have a completely bare-metal image which includes the exception vector table and want to have it start in the same way the hardware would out of reset.

-bios is the “load a bios image, in whatever way seems good for this machine model” option. Again, this is a “do what I mean” kind of option whose specifics vary from machine model to machine model and from architecture to architecture; some machines don’t support it at all. Some machines (eg x86 PC) will always load a bios, using a default binary if the user didn’t specify. Some will load a bios if the user asks, but not otherwise (the arm virt board is like this). Generally a bios image is expected to be a “bare metal raw binary” image which will get loaded into some flash or ROM memory which corresponds to wherever the hardware starts execution when it comes out of reset. On at least some machines, including ‘virt’, you can instead provide the contents of the flash/ROM devices using a command line like “-drive if=pflash,…”. This is an example of a common pattern in QEMU where you can either use a short “do what I mean” option that is convenient but has a lot of magic under the hood, or a longer “orthogonal” option which lets you specify lots of sub-options and get exactly the behaviour you want. Note that BIOS images should not be ELF files, they’re expected to just be the raw data to put into the ROMs.

A lot of this is undocumented, because “I want to run a bare metal program of my own devising” is a very niche use case and because we don’t have a good place in our documentation to make it easy to document the specifics of different board models.

Leave a Comment