Saturday, April 12, 2014

Inline assembler stupidity

I keep getting caught by this, because this is a perfect example of the compiler doing something contrary to what you're writing.
  asm volatile (
                "ldr %0, [%1]\n\t"
                "add %0, %0, #1\n\t"
                "str %0, [%1]\n\t"
                : "=r" (tmp)
                : "r" (p)
                :
                );

Guess what this gets compiled to?
  30: f9400000  ldr x0, [x0]
  34: 91000400  add x0, x0, #0x1
  38: f9000000  str x0, [x0]

...equivalent to, of course,
  asm volatile (
                "ldr %0, [%0]\n\t"
                "add %0, %0, #1\n\t"
                "str %0, [%0]\n\t"
                : "+r" (p)
                :
                :
                );
The sort of aggressive and non-obvious optimization is crazy because if I really wanted the generated code, I'd have written the inline asm the second way with a read and write modifier. Maybe for architectures with specialized and very few registers this is a reasonable approach, but for RISCish instruction sets with large instruction files this is nuts. There should be a warning option for this nonsense.

This "correct way" is to use an earlyclobber modifier.
  asm volatile (
                "ldr %0, [%1]\n\t"
                "add %0, %0, #1\n\t"
                "str %0, [%1]\n\t"
                : "=&r" (tmp)
                : "r" (p)
                :
                );
IMO anything that needs a separate paragraph in third-party documents as "a caveat" needs to be fixed.

Speaking of which... Given that C really is a high-level assembly, why not finally standardize on inline asm?

Wednesday, April 2, 2014

Exotic QEMU bugs and fixes

I found that the linux-user portion of QEMU has a few bugs around signals. Really, around handling "self-modifying" code and having the code generator step on unmapped memory.

The test is pretty simple.  Have a page of memory containing one instruction which will cause SIGILL to be delivered, followed by a 'ret'. On a SIGILL, unmap the page. On a SIGSEGV, map the page back in. I've two of these tests - one with actual mmap/munmap, and another with mprotect. The tests verify corner conditions in the binary translation logic, with back-to-back signals and an attempt to execute unmapped code.
https://github.com/andreiw/andreiw-wip/blob/master/qemu/tests/sigtest.c
https://github.com/andreiw/andreiw-wip/blob/master/qemu/tests/sigtest_mprotect.c

"self-modifying" code sounds grand, but it's just the signal return path. While newer Linux kernels use VDSO symbols for the restorer (that's the part that does the sigreturn syscall), QEMU still creates an on-the-stack trampoline. When QEMU creates a translation block for the trampoline, it marks the page internally as read-only so that it can detect when the TB should be invalidated. It is this later logic which was short-circuiting and exiting earlier than needed.
That's fixed in https://github.com/andreiw/andreiw-wip/blob/master/qemu/0001-qemu-fix-page_check_range.patch

The second problem is that QEMU doesn't deal very well with being forced to run code that's unmapped. The TCG generator walks over the unmapped memory, gets a SIGSEGV, which attempts delivery of the signal to the translated program (which again, means getting and/or creating more TBs). The problem, though, is that we attempt to reacquire the tcg_ctx.tb_ctx.tb_lock, which we never dropped due to the signal. i.e. after a SIGSEGV here:
#0  disas_a64_insn (s=0x7fffffffdc40, env=<optimized out>) at /target-arm/translate-a64.c:8972
#1  gen_intermediate_code_internal_a64 (cpu=cpu@entry=0x62532200, tb=tb@entry=0x7ffff440b120, search_pc=search_pc@entry=false) at /target-arm/translate-a64.c:9097
#2  0x00000000600d76e5 in gen_intermediate_code_internal (search_pc=false, tb=0x7ffff440b120, cpu=0x62532200) at /target-arm/translate.c:10629
#3  gen_intermediate_code (env=env@entry=0x6253a468, tb=tb@entry=0x7ffff440b120) at /target-arm/translate.c:10904
#4  0x00000000600e4851 in cpu_arm_gen_code (env=env@entry=0x6253a468, tb=tb@entry=0x7ffff440b120, gen_code_size_ptr=gen_code_size_ptr@entry=0x7fffffffdd64) at /translate-all.c:159
#5  0x00000000600e5152 in tb_gen_code (cpu=cpu@entry=0x62532200, pc=pc@entry=4820992, cs_base=cs_base@entry=0, flags=<optimized out>, cflags=cflags@entry=0) at /translate-all.c:973
#6  0x0000000060040e7a in tb_find_slow (flags=<optimized out>, pc=4820992, env=0x6253a468, cs_base=<optimized out>) at /cpu-exec.c:162
#7  tb_find_fast (env=0x6253a468) at /cpu-exec.c:193
#8  cpu_arm_exec (env=env@entry=0x6253a468) at /cpu-exec.c:611
#9  0x000000006005ad2c in cpu_loop (env=env@entry=0x6253a468) at /linux-user/main.c:1015
#10 0x0000000060004dd1 in main (argc=1, argv=<optimized out>, envp=<optimized out>) at /linux-user/main.c:4392

We longjmp back to the CPU loop and deadlock here:
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
#1  0x000000006012991d in _L_lock_858 ()
#2  0x000000006012978a in __pthread_mutex_lock (mutex=0x604ffa98 <tcg_ctx+350904>) at pthread_mutex_lock.c:61
#3  0x0000000060040bfd in cpu_arm_exec (env=env@entry=0x6253a228) at /cpu-exec.c:610
#4  0x000000006005ad2c in cpu_loop (env=env@entry=0x6253a228) at /linux-user/main.c:1015
#5  0x0000000060004dd1 in main (argc=1, argv=<optimized out>, envp=<optimized out>) at /linux-user/main.c:4392
The solution is to allow tb_gen_code to back out if it knows it can't read the memory. A new exception type is added, EXCP_TB_EFAULT, which then needs to be handled just like an address fault inside cpu_loop.

https://github.com/andreiw/andreiw-wip/blob/master/qemu/0002-qemu-handle-tb_gen_code-getting-called-for-unmapped-.patch
https://github.com/andreiw/andreiw-wip/blob/master/qemu/0003-x86-implement-EXCP_TB_EFAULT.patch

This makes the above tests pass on AArch64 and x86 (32-bit only, since there is no signal handling support for the x86_64-linux-user target at the moment).

Update: Fixes look like they're going in. The TCG deadlock is getting fixed in a simpler way. It is a better and more self-contained fix. http://www.mail-archive.com/qemu-devel@nongnu.org/msg225421.html

Tuesday, March 18, 2014

OpenBIOS and partition-zero booting, redux

As of r1280 OpenBIOS now correctly implements booting via partition-zero boot block.

http://git.qemu.org/?p=openbios.git;a=commit;h=1ac3fb92c109f5545d373a0576b87750c53cce19

That's the power of Open Source.

Wednesday, March 5, 2014

Preboot scripts, redux.

The old implementation suffered from a couple problems:
  • Chain booting and separate boot blocks made installation fussy
  • I discovered that some OF versions don't quite claim more than a certain amount of memory for loading, making dual boot blocks waste a lot of precious RAM (I had two useless 4K stacks, for example). The symptom is a DSI during the actual load by the firmware - the same problem as seen trying to boot *BSD boot floppies on the 3400c. There might be a work around manually claiming the memory, but... heh.
I then realized I could have well added the preboot script at the end of the iquik.b block, and have the later be smart enough to eval the script prior to looking for the boot-file.

Usage is pretty simple (the -p option):
$ cat > hello.of 
$ cr cr cr ." Hello OF!!!" cr cr cr
$ iquik -p hello.of

Of course, this works only if booting iQUIK via partition zero (i.e. :0 or %BOOT). Presumably if you're running the ELF directly (hi CHRP!), you'll use a normal CHRP boot script.

Making OpenBIOS support Apple partition-zero booting

To speed up my development cycle on iQUIK and make it less painful I really needed to get OpenBIOS to boot via bootcode correctly. The current sources (r1272) hard code the load address for the legacy QUIK bootloader, which makes it useless for me or for anyone else (like the NetBSD or OpenBSD bootloaders).

I didn't try very hard, but the end result works, and hopefully the OpenBIOS guys will just take my fix.

SVN workflow seems so...senescent compared to git. Sigh.
Index: forth/debugging/client.fs
===================================================================
--- forth/debugging/client.fs (revision 1272)
+++ forth/debugging/client.fs (working copy)
@@ -28,7 +28,13 @@
 0 state-valid !
 
 variable want-bootcode
+variable bootcode-base
+variable bootcode-size
+variable bootcode-entry
 0 want-bootcode !
+0 bootcode-base !
+0 bootcode-size !
+0 bootcode-entry !
 
 variable file-size
 
Index: libopenbios/bootcode_load.c
===================================================================
--- libopenbios/bootcode_load.c (revision 1272)
+++ libopenbios/bootcode_load.c (working copy)
@@ -12,13 +12,11 @@
 #define printf printk
 #define debug printk
 
-#define OLDWORLD_BOOTCODE_BASEADDR (0x3f4000)
-
 int 
 bootcode_load(ihandle_t dev)
 {
     int retval = -1, count = 0, fd;
-    unsigned long bootcode, loadbase, offset;
+    unsigned long bootcode, loadbase, offset, loadsize, entry;
 
     /* Mark the saved-program-state as invalid */
     feval("0 state-valid !");
@@ -33,34 +31,59 @@
     loadbase = POP();
     
 #ifdef CONFIG_PPC
-    /* ...except that QUIK (the only known user of %BOOT to date) is built
-       with a hard-coded address of 0x3f4000. Let's just use this for the
-       moment on both New World and Old World Macs, allowing QUIK to also
-       work under a New World Mac. If we find another user of %BOOT we can
-       rethink this later. PReP machines should be left unaffected. */
+    /*
+     * Apple OF does not honor load-base and instead uses pmBootLoad
+     * value from the boot partition descriptor.
+     *
+     * Tested with:
+     *   a debian image with QUIK installed
+     *   a debian image with iQUIK installed (https://github.com/andreiw/quik)
+     *   an IQUIK boot floppy
+     *   a NetBSD boot floppy (boots stage 2)
+     */
     if (is_apple()) {
-        loadbase = OLDWORLD_BOOTCODE_BASEADDR;
+      feval("bootcode-base @");
+      loadbase = POP();
+      feval("bootcode-size @");
+      loadsize = POP();
+      feval("bootcode-entry @");
+      entry = POP();
+
+      printk("bootcode base 0x%lx, size 0x%lx, entry 0x%lx\n",
+             loadbase, loadsize, entry);
+    } else {
+      entry = loadbase;
+
+      /* Load as much as we can. */
+      loadsize = 0;
     }
 #endif
     
     bootcode = loadbase;
     offset = 0;
     
-    while(1) {
+    if (loadsize) {
+      if (seek_io(fd, offset) != -1)
+        count = read_io(fd, (void *) bootcode, loadsize);
+    } else {
+      while(1) {
         if (seek_io(fd, offset) == -1)
-            break;
+          break;
         count = read_io(fd, (void *)bootcode, 512);
         offset += count;
         bootcode += count;
+      }
     }
 
     /* If we didn't read anything then exit */
     if (!count) {
         goto out;
     }
+
+    printk("entry = 0x%lx\n", entry);
     
     /* Initialise saved-program-state */
-    PUSH(loadbase);
+    PUSH(entry);
     feval("saved-program-state >sps.entry !");
     PUSH(offset);
     feval("saved-program-state >sps.file-size !");
Index: libopenbios/load.c
===================================================================
--- libopenbios/load.c (revision 1272)
+++ libopenbios/load.c (working copy)
@@ -1,6 +1,6 @@
 /*
  *   Creation Date: <2010/06/25 20:00:00 mcayland>
- *   Time-stamp: <2010/06/25 20:00:00 mcayland>
+ *   Time-stamp: <2014-03-05 03:18:49 andreiw>
  *
  * <load.c>
  *
Index: packages/mac-parts.c
===================================================================
--- packages/mac-parts.c (revision 1272)
+++ packages/mac-parts.c (working copy)
@@ -1,6 +1,6 @@
 /*
  *   Creation Date: <2003/12/04 17:07:05 samuel>
- *   Time-stamp: <2004/01/07 19:36:09 samuel>
+ *   Time-stamp: <2014-03-05 03:42:07 andreiw>
  *
  * <mac-parts.c>
  *
@@ -237,6 +237,19 @@
      size = (long long)__be32_to_cpu(par.pmPartBlkCnt) * bs; 
      
      if (want_bootcode) {
+  ucell loadaddr = 0;
+  ucell loadsize = 0;
+  ucell loadentry = 0;
+
+  loadaddr = __be32_to_cpu(par.pmBootLoad);
+  loadsize = __be32_to_cpu(par.pmBootSize);
+  loadentry = __be32_to_cpu(par.pmBootEntry);
+  PUSH(loadaddr);
+  feval("bootcode-base !");
+  PUSH(loadsize);
+  feval("bootcode-size !");
+  PUSH(loadentry);
+  feval("bootcode-entry !");
   offs += (long long)__be32_to_cpu(par.pmLgBootStart) * bs;
   size = (long long)__be32_to_cpu(par.pmBootSize);
      }
@@ -249,6 +262,10 @@
      di->size_hi = size >> BITS;
      di->size_lo = size & (ucell) -1;
 
+     if (want_bootcode) {
+       goto out;
+     }
+
      /* We have a valid partition - so probe for a filesystem at the current offset */
      DPRINTF("mac-parts: about to probe for fs\n");
      DPUSH( offs );
@@ -277,7 +294,7 @@
   
       /* If we have been asked to open a particular file, interpose the filesystem package with 
       the passed filename as an argument */
-      if (!want_bootcode && strlen(argstr)) {
+      if ( strlen(argstr)) {
        push_str( argstr );
        PUSH_ph( ph );
        fword("interpose");
@@ -286,6 +303,10 @@
       goto out;
      } else {
       DPRINTF("mac-parts: no filesystem found on partition %d; bypassing misc-files interpose\n", parnum);
+
+      /* Fail out instead of having macparts_load get called uselessly, allowing trying the next
+         boot device */
+      ret = 0;
      }
  }
This does make booting iQUIK on OpenBIOS now very easy.
qemu-system-ppc -hda ~/src/quik/distrib/floppy-cfg.img -prom-env "boot-file=hd:3"

Getting PowerPC OpenBIOS to run on QEMU

  • You've apt-get installed qemu, but qemu-system-ppc boots to a blank (white or black) screen?
  • You've pulled the OpenBIOS SVN, built qemu-openbios.elf, but it boots to a blank screen?
On serial output, you might see "<< set_property: NULL phandle" messages, and the CPU is stuck in a perpetual ISI.

Have no fear. Apparently GCC versions > 4.6 miscompile OpenBIOS, so you need to disable optimization. This is presently set to "-Os" under "Makefile.target". Setting it t "-O0" should do.

I'll probably investigate this deeper after fixing partition-zero booting...

A

Wednesday, October 9, 2013

Scripted boot sequences on OldWorld PowerMacs

Having a decent bootloader on an installed system is a good start, but how do you install it in the first place?

iQUIK relies on having OFW properties or boot arguments to figure out where to look for the configuration file. Having the user pass extra arguments involves knowing those arguments - in our case, the file path and the device path. While the first we know (as a developer), the second depends on the choice of boot media and machine. Of course, we could bake in nasty logic to special case CDs, or scan all partitions on the booted device, or use magic locations in the binary, but none of this is a clean approach.

Ideally we would want to run on a CHRP machine, where we could boot a "CHRP script", which looks something like this:

FreeBSD/powerpc bootloader
FreeBSD
 $FreeBSD$ 

MacRISC MacRISC3 MacRISC4


boot device:partition,\ppc\kernel;.elf


This would let us verify the target system, and run arbitrary Forth to set up the needed parameters (or even interact with the user). A good example of a complicated CHRP script is the MacOS ROM file on NewWorlds, which is a pretty large blob of Forth followed by an embedded ELF image.

But we're on OldWorld. No bootinfo support. What can we do? We can do pretty much the same, except we have to prefix our Forth with a machine code stub that will call into PROM to interpret the following Forth code. Afterwards, we can pass control to the regular boot loader, which is embedded right after the Forth words. Whereas before the boot partition contained just the iQUIK boot block, now it's a "sandwich" of simply concatenated Forth hand-off stub, the Forth code and the iQUIK boot block. All we have to do is compute the load address correctly, such that the iQUIK boot block falls on its expected linked address.