Tuesday, July 4, 2017

Porting UEFI to XXX, step 1

I've decided to do the actual blogging for this project *in* the repo itself. See https://github.com/andreiw/ppcnw-edk2/blob/master/README.md. After all, markdown is convenient enough and using Blogger on the G4 is p-a-i-n-f-u-l.

So it turns out that blogging about something after the fact is pretty tough. I really wanted to blog about my PoC port of UEFI to the OpenPower ecosystem, but it's incredibly difficult to go back and try to systematize something that's been a few years back.

So let's try this again. This time, our victim will be a G4 12" PowerBook6,8 with a 7447A. That's a 32-bit PowerPC. Now, I'll go in small steps and document *everything*. For added fun, we'll begin porting on the target itself, at least until that gets too tedious.

First, I updated to the latest (and last) Debian 8 (Jessie).

Now let's clone the tree.

$ git clone https://github.com/tianocore/edk2

Setup the UEFI environment.

$ cd edk2
$ . edksetup.sh

Now we need to get the BaseTools building.

pbg4:~/src/edk2/BaseTools/ make
make -C Source/C
make[1]: Entering directory '/home/andreiw/src/edk2/BaseTools/Source/C'
Attempting to detect ARCH from 'uname -m': ppc
Could not detected ARCH from uname results
GNUmakefile:36: *** ARCH is not defined!.  Stop.
make[1]: Leaving directory '/home/andreiw/src/edk2/BaseTools/Source/C'
GNUmakefile:25: recipe for target 'Source/C' failed
make: *** [Source/C] Error 2

Ok. Let's fix that. We'll first need a Source/C/Include/PPC/ProcessorBind.h file.

ProcessorBind.h I've derived from another 32-bit CPU, like IA32 or ARM. This contains type definitions, mostly. It's boilerplate. In case there are multiple coding conventions for your architectures and it's not obvious which one you should be using, you might wish to specify what the EFIAPI attribute will be. Like, on x86 Windows-style cdecl is used, regardless of how you build the rest of Tiano. On most architectures an empty define is fine.

Now appropriately hook it into Source/C/Makefiles/header.makefile.

--- a/BaseTools/Source/C/Makefiles/header.makefile
+++ b/BaseTools/Source/C/Makefiles/header.makefile
@@ -43,6 +43,10 @@ ifeq ($(ARCH), AARCH64)
 ARCH_INCLUDE = -I $(MAKEROOT)/Include/AArch64/
 endif
+ifeq ($(ARCH), PPC)
+ARCH_INCLUDE = -I $(MAKEROOT)/Include/PPC/
+endif

Fix the ARCH detection in Source/C/GNUmakefile.

--- a/BaseTools/Source/C/GNUmakefile
+++ b/BaseTools/Source/C/GNUmakefile
@@ -31,6 +31,9 @@ ifndef ARCH
   ifneq (,$(findstring arm,$(uname_m)))
     ARCH=ARM
   endif
  ifneq (,$(findstring ppc,$(uname_m)))
    ARCH=PPC
  endif

Ok, ensure you have the libuuid headers (Debian uuid-dev) and g++. And...

You are done. This gives you the tools need to help build UEFI. Now we need to teach the build system about PowerPC...

Wednesday, June 8, 2016

Disassembling NT system files

Most NT files are stripped. This means that trying to disassemble them is a bit annoying because there are no symbols available. Checked builds of NT came with the symbol files (e.g. support/debug/ppc/symbols/exe/ntoskrnl.dbg for ntoskrnl.exe), but tools like Microsoft's dumpbin or OpenWatcom's wdis don't use them.

Now there's https://github.com/andreiw/dbgsplice to add the COFF symbol table back!


Sadly, the OpenWatcom analogue is quite buggy, so it's hard to suggest. It was a capable disassembler around setupldr and veneer.exe, but it gets horribly confused with complicated section layouts.

Of course the DBG files contain quite a bit more info (and we can do a lot more with the aux COFF syms too for annotating code than dumpbin suggests).

Sunday, May 8, 2016

Easy creation of proxy DLL pragmas

Converting dumpbin DLL exports information to MSVC linker pragmas

Yes, a bit of a weird request. But imagine you want to create a dummy DLL that forwards all the existing symbols of another DLL. Of course you're not going to do it by hand.

You have dumpbin output that looks like:

And you want:

I'm not an awk expert, but this works, except the dumpbin I ran on WIndows and awk I ran on OS X, hahah. But you get the gist...
dumpbin /exports C:\winnt\system32\ntdll.dll |
awk 'NR > 19 && $3 != "" { printf "#pragma comment(linker, \"/export:%s=ntdll.%s\")\n", $3, $3 }'
Might have to tweak number of lines to skip, depending on your tools. I'm on MSVC 4.0 (hello '90s!).

64-bit ARM OS/Kernel/Systems Development Demo on an nVidia Shield TV (Tegra X1)

64-bit ARM OS/Kernel/Systems Development on an nVidia Shield TV (Tegra X1)

The Shield TV is based on the 64-bit nVidia X1 chip. Unlike the K1, this is actually a Cortex-A57 based design, instead of being based on the nVidia "Denver" design. That by itself is kind of interesting already. The Shield TV was available much much earlier than the X1-based nVidia development board (Jetson TX1, you can even buy it on Amazon), and costs about a third of the TX1. The Shield TV allows performing an unlock via "fastboot oem unlock", allowing custom OS images to be booted. Unlike the TX1, you don't get a UART (and I haven't found the UART pads yet, either).

What this is

https://github.com/andreiw/shieldTV_demo

This is a small demo, demonstrating how to build and boot arbitrary code on your Tegra Shield TV. Unlike the previous Tegra K1 demo, you get EL2 (hypervisor mode!).

  • A Shield TV, unlocked. Search Youtube for walkthroughs.
  • Shield OS version >= 1.3.
  • GNU Make.
  • An AArch64 GNU toolchain.
  • ADB/Fastboot tools.
  • Bootimg tools (https://github.com/pbatard/bootimg-tools), built and somewhere in your path.
  • An HDMI-capable screen. Note, HDMI, not DVI-HDMI adapter. You want the firmware to configure the screen into 1920x1080 mode, otherwise you'll be in 640x480 and we don't want that...

How to build

$ CROSS_COMPILE=aarch64-linux-gnu- make
...should yield 'shieldTV_demo'.

How to boot

  1. Connect the Shield TV a USB cable to your dev workstation.
  2. Reboot device via:
    $ adb reboot-bootloader
    ...you should now see the nVidia splash screen, followed by the boot menu.
  3. If OS is 1.3, you can simply:
    $ fastboot boot shieldTV_demo
  4. If OS is 1.4 or 2.1, you will need to:
    $ fastboot flash recovery shieldTV_demo
    ...and then boot the "recovery kernel" by following instructions on screen.
The code will now start. You will see text and some drawn diagonal lines black background. The text should say we're at EL2 and the lines should be green. The drawing will be slow - the MMU is off and the caches are thus disabled.

Let me know if it's interesting to see the MMU setup code.

Final thoughts

The Shield TV is a better deal than the TX1 for the average hobbyist, even with the missing UART. For the price being sold the TX1 should come with a decent amount of RAM, not 1GB more than the Shield TV. nVidia...are you listening? Uncripple your firmware so booting custom images is not a song-and-dance (you broke it in 1.4!) and at least TELL us where the UART pads are on the motherboard. If you're really cool put together an "official" Ubuntu image that runs on the TX1 and the Shield (and fix SCR_EL3.HCE, too).

Saturday, May 7, 2016

Porting TianoCore to a new architecture

"UEFI" on...?

This article is the first in a series of posts touching on the general process of bringing up TianoCore EDK2 on an otherwise unsupported architecture. Maybe you want to support UEFI for your CPU architecture, or simply have a reasonable firmware environment.  In either case, because UEFI is not actually defined for your architecture, you're going to have to do a bit more work than your typical platform bring-up. By the time you're done, you could become a perfect addition to the UEFI forum and its specification committees...yeah!

This blog post and the ones following it continually refer to the ongoing PPC64LE Tiano port I am working on, available at https://github.com/andreiw/ppc64le-edk2/. Since everyone can read the fine code, this document mostly highlights the various steps performed throughout the commits. The git repo isn't perfect, though. Some changes ended up evolving over a few commits while I ironed things out and brought in more code. Hopefully I don't miss mentioning anything important.


TianoCore

Tiano is Intel's open-source (BSD-licensed) implementation of the UEFI specification. EDK2 is the second and current iteration of the implementation.

UEFI officially is supported on IA32, X64, IPF, ARM and AARCH64 architectures, the EDK2 has CPU support code for the x86 and ARM variants. There's a MIPS EDK1 port floating about, and now two PPC64LE OpenPower EDK2 ports, one of which ended up fueling this article...

I'm not going to focus on the architecture of either UEFI or Tiano. Good books have been written on the subject. Here's some tl;dr material, though, for the hopelessly impatient:
At least read the User Documentation and glance at the boot flow diagram. You should be now able to fetch, build and boot Tiano Core using the emulation package and have a rough understanding of what it takes to get a build going via Conf/target.txt and Conf/tools_def.txt.

Your Target

Your target is a 32-bit or 64-bit little-endian chip. I suppose big-endian is doable, but none of the Tiano code is endian safe and UEFI is strictly little-endian.

Development Environment

This assumes that you are doing development on Linux and are using and ELF toolchain and GCC compilers. You are going to need:

Basic Project Setup

Pick a short identifier for your architecture. Pick a name that's unused - for the Power8 port I picked PPC64. This tag will be used with the Tiano build scripts. The next step is to create a couple of Pkg directories that will contain our stuff. In my PPC64 port I initially went with a single-package solution, but I should be working on splitting it up into the more conventional layout, where platform-independent portions are in PPC64Pkg and platform-dependent parts (including build scripts for building for actual boards) are in PPC64PlatformPkg. You can refer to this commit as an example for the minimum required to build a dummy EFI application that does nothing.

If our architecture was supported, then running a similar build command for your package would succeed.
build -p YourArchPlatformPkg/YourArchPlatformPkg.dsc

Build Infrastructure

The next step is to enable the build infrastructure and scripts to understand your architecture identifier. Here's a list of files I had to modify - this is all stuff under BaseTools/Source/Python and the changes are incredibly mechanical.
  • Common/DataType.py
  • Common/EdkIIWorkspaceBuild.py
  • Common/FdfParserLite.py
  • Common/MigrationUtilities.py
  • CommonDataClass/CommonClass.py
  • CommonDataClass/ModuleClass.py
  • CommonDataClass/PlatformClass.py
  • GenFds/FdfParser.py
  • GenFds/FfsInfStatement.py
  • GenFds/GenFds.py
  • TargetTool/TargetTool.py
  • build/build.py

Build tools

UEFI executables are PE/COFF files. Since we are building on Linux, EDK2 uses a workflow where ELF artifacts produced by the cross-compiler are converted into PE32/PE32+ files with the GenFw tool. The PE/COFF artifacts are then wrapped into an FFS object and assembled into what is known as an FV ("firmware volume"). Multiple FVs are put into an FD.  The FV is really a flat file system that uses GUIDs for everything and can store other types of objects as well. You could also generate what is known as a TE (terse executable), but it's basically a cut down version of COFF FWIW.

Tiano deals with several kinds of executables. The UEFI runtime (DXE core, UEFI drivers, and so on) are relocated as they are loaded, while the code that runs prior to the UEFI runtime itself is XIP (execute-in-place) and is thus pre-relocated to fixed addresses by the tool constructing the FV. The point behind this being that such pre-UEFI code (which is the SEC and PEI phases for Tiano), is run in an environment before the DRAM is available.

Thus we need to enable GenFw to create PE/COFF executables from ELF for our architecture. This ties into the compiler options we're going to use, which is not something we've addressed yet. It helps to understand that a PE/COFF file is basically a position-independent executable. Although there is "preferred linking address" that all symbols are relocated against, the PE/COFF image contains a sufficient amount of relocation information, known as "base relocations", to allow loading at any address. So it would appear, that the easiest approach is to generate a position-independent ET_DYN ELF executable (with the -pie flag to the linker) and then focus on converting the output to COFF. This is the approach I highly suggest adopting. You will only have to deal with a single relocation type (R_PPC64_RELATIVE in my case) that will map naturally to either the 64-bit or the 32-bit COFF base relocation type, depending on the bit width of your architecture.

Other approaches are possible, such as embedding all relocations with the --emit-relocs linker flag and dealing with the entire soup of crazy relocs later, but the success of this approach is highly dependent on the architecture and ABI. It may be impossible to convert to PE/COFF due to a mismatch between ELF and COFF base relocs and ABI issues. When first working on the PPC64 port, I first followed the AArch64 approach that did just this, and ended up being forced to use the older (and not really meant for LE) ELFv1 ABI. Don't do it.

Note that depending on the ABI, you may have to do a bit of tool work. I am guessing this was the reason why the AArch64 port never adopted using PIE ELF binaries. Fortunately, you should be able to follow along my changes to Elf64Convert.c. You may also have to make changes to the base linker script used with the GNU toolchains.

Don't forget that you will need to manually rebuild the BaseTools if you make any changes!
make -C BaseTools

At this point we can go back and figure out the compile options. This is the BaseTools/Conf/tools_def.template file, that is then copied to Conf/ by edksetup.sh on freshly checked-out trees. The compiler options heavily depend on your architecture, of course, but generally speaking:
  • build on top of definitions made for new architectures like AArch64, because there's simply less of them in this file to wrap your mind around
  • consider the PPC64 definitions in my tree
  • -pie, unless position-independent executables don't work for you for some reason
  • large model
  • soft float (you can always move to hard float later if that rocks your boat, but it's just more CPU state to wrap your head around)
  • PECOFF_HEADER_SIZE=0x228 for 64-bit chips, 0x220 for 32-bit ones
This is the point where trying to build again should start giving you compile errors, because we still haven't modified any of the Tiano include and library files to be aware of the new architecture.

To be continued.

Thursday, October 1, 2015

Toying around with LE PowerPC64 via the PowerNV QEMU

I've validated that my ppc64le_hello example runs on top of BenH's PowerNV QEMU tree. Runs really snappy!

The only thing that doesn't work is mixed page-size segment support (MPSS, like 16MB in a 4K segment). QEMU does not support MPSS at the moment. Also, QEMU does not implement any of the IBM simulator's crazy Mambo calls.

Monday, July 13, 2015

Toying around with LE PowerPC64 via the Power8 simulator

ppc64le_hello is simple example of what it takes to write stand-alone (that is, system or OS) code that runs in Little-Endian and Hypervisor modes on the latest OpenPOWER/Power8 chips. Of course, I don't have a spare $3k to get one of these nice Tyan reference systems, but IBM does have a free, albeit glacially slow and non-OSS, POWER8 Functional Simulator.

What you get is a simple payload you can boot via skiboot, or another OPAL-compatible firmware. Features, in no particular order:
  • 64-bit real-mode HV LE operation.
  • logging via sim inteface (mambo_write).
  • logging via OPAL firmware (opal_write).
  • calling C code, stack/BSS/linkage setup/TOC.
  • calling BE code from LE.
  • FDT parsing, dumping FDT.
  • Taking and returning from exceptions, handling unrecoverable/nested exceptions.
  • Timebase (i.e. the "timestamp counter"), decrementer and hypervisor decrementer manipulation with some basic timer support (done for periodic callbacks into OPAL).
  • Running at HV alias addresses (loaded at 0x00000000200XXXXX, linked at 0x80000000200XXXXX). The idea being that the code will access physical RAM and its own data structures solely using the HV addresses.
  • SLB setup: demonstrates 1T segments with 4K base page and 16M base page size. One segment (slot = 0) is used  to back the HV alias addresses with 16M pages. Another  segment maps EA to VA 1:1 using 4K pages.
  • Very basic HTAB setup. Mapping and unmapping for pages in the 4K and 16M segments, supporting MPSS (16M pages in the 4K segment). No secondary PTEG. No eviction support. Not SMP safe. Any access within the HV alias addresses get mapped in. Any faults to other  unmapped locations are crashes, as addresses below 0x8000000000000000 should only be explicit maps.
  • Taking exception vectors with MMU on at the alternate vector location (AIL) 0xc000000000004000.
  • Running unpriviledged code.
See README for more information, including how to build and run. At some point it ran on a real Power8 machine - and may run still ;-).

Monday, July 6, 2015

DOES> in Jonesforth

Jonesforth 47 quoth:
NOTES ----------------------------------------------------------------------

DOES> isn't possible to implement with this FORTH because we don't have a separate
data pointer.

Thankfully, that's not true. The following is a tad AArch32-specific, given that I am playing with pijFORTHos (https://github.com/organix/pijFORTHos), but the principle remains the same. Let's first look at how DOES> gets used.
: MKCON WORD CREATE 0 , , DOES> @ ;
This creates a word MKCON, that when invoked like:
1337 MKCON PUSH1337
...creates a new word PUSH1337 that will behave, as if it were defined as:
: PUSH1337 1337 ;
Recall the CREATE...;CODE example. DOES> is very similar to ;CODE, except you want Forth words, not native machine words invoked. In ;CODE, the native machine words are embedded in the word using CREATE...;CODE, and in CREATE...DOES> it will be Forth words instead. So if we had no DOES> word, we could write something like:
: MKCON WORD CREATE 0 , , ;CODE $DODOES @ ;
...where $DODOES is the machine code generator word that creates the magic we've yet to figure out. $DODOES needs to behave like a mix between DOCOL and NEXT, that is adjusting FIP (the indirect threaded code instruction pointer, pointing to the next word to execute) to point past $DODOES to the @ word. The DFA of the CREATEd word (i.e. PUSH1337) is put on the stack, so @ can read the constant (1337) out. This means the simplest CREATE...DOES> example is:
: DUMMY WORD CREATE 0 , DOES> DROP ;
DUMMY ADUMMY
...because we need to clean up the DFA for ADUMMY that is pushed on its invocation. Anyway, we could thus define DOES> like:
: DOES> IMMEDIATE ' (;CODE) , [COMPILE] $DODOES ;
Let's look at two ways of implementing $DODOES. Way 1 - fully inline. The address of the Forth words (the new FIP) is calculated by skipping past the bits emitted by $DODOES.
        .macro COMPILE_INSN, insn:vararg
        .int LIT
        \insn
        .int COMMA
        .endm

        .macro NEXT_BODY, wrap_insn:vararg=
        \wrap_insn ldr r0, [FIP], #4
        \wrap_insn ldr r1, [r0]
        \wrap_insn bx  r1
        .endm
@
@ A CREATE...DOES> word is basically a special CREATE...;CODE
@ word, where the forth words follow $DODOES. $DODOES thus
@ adjusts FIP to point right past $DODOES and does NEXT.
@
@ You can think of this as a special DOCOL that sets FIP to a
@ certain offset into the CREATE...DOES> word's DFA. This
@ version is embedded into the DFA so finding FIP is
@ as easy as moving FIP past itself.
@
@ - Just like DOCOL, we enter with CFA in r0.
@ - Just like DOCOL, we need to push (old) FIP for EXIT to pop.
@ - The forth words expect DFA on stack.
@
        .macro DODOES_BODY, magic=, wrap_insn:vararg=
0:      \wrap_insn PUSHRSP FIP
1:      \wrap_insn ldr FIP, [r0]
        \wrap_insn add FIP, FIP, #((2f-0b)/((1b-0b)/(4)))
        \wrap_insn add r0, r0, #4
        \wrap_insn PUSHDSP r0
        NEXT_BODY \wrap_insn
2:
        .endm
@
@ $DODOES ( -- ) emits the machine words used by DOES>.
@
defword "$DODOES",F_IMM,ASMDODOES
        DODOES_BODY ASMDODOES, COMPILE_INSN
        .int EXIT

Way 2 - partly inline, where the emitted code does an absolute branch and link. This reduces the amount of memory used per definition at the cost of a branch. Ultimately this is the solution adopted. _DODOES calculates the new FIP adjusting the return address from the branch-and-link done by the inlined bits.
_DODOES:
        PUSHRSP FIP        @ just like DOCOL, for EXIT to work
        mov FIP, lr        @ FIP now points to label 3 below
        add FIP, FIP, #4   @ add 4 to skip past ldr storage
        add r0, r0, #4     @ r0 was CFA
        PUSHDSP r0         @ need to push DFA onto stack
        NEXT

        .macro DODOES_BODY, wrap_insn:vararg=
1:      \wrap_insn ldr r12, . + ((3f-1b)/((2f-1b)/(4)))
2:      \wrap_insn blx r12
3:      \wrap_insn .long _DODOES
        .endm

@
@ $DODOES ( -- ) emits the machine words used by DOES>.
@
defword "$DODOES",F_IMM,ASMDODOES
        DODOES_BODY COMPILE_INSN
        .int EXIT
In either case, just like DOCOL, we need to push the old FIP pointer before calculating the new one. The old FIP pointer corresponds to the address within the word that called the DOES>-created word. In both cases we need to push the DFA of the executing word onto the stack (this is in r0 on the AArch32 Jonesforth).

Finally, in both cases the CREATE...DOES> word is indistinguishable from a CREATE...;CODE word, and the created word is indistinguishable from a word created by a CREATE...;CODE word.
\ This is the CREATE...;CODE $DOCON END-CODE example before.
: MKCON WORD CREATE 0 , , ;CODE ( MKCON+7 ) E590C004 E52DC004 E49A0004 E5901000 E12FFF11 (END-CODE)
CODE CON ( CODEWORD MKCON+7 ) 5 (END-CODE)

\ Fully inlined CREATE...DOES>.
: MKCON_WAY1 WORD CREATE 0 , , ;CODE ( MKCON_WAY1+7) E52BA004 E590A000 E28AA020 E2800004 E52D0004 E49A0004 E5901000 E12FFF11 9714 938C (END-CODE)
CODE CON_BY_WAY1 ( CODEWORD MKCON_WAY1+7 ) 5 (END-CODE)

\ Partly-inlined CREATE...DOES>. 
: MKCON_WAY2 WORD CREATE 0 , , ;CODE ( MKCON_WAY2+7 ) E59FC000 E12FFF3C 9F64 9714 938C (END-CODE)
CODE CON_BY_WAY2 ( CODEWORD MKCON_WAY2+7 ) 5 (END-CODE)
This makes decompiling (i.e. SEE) a bit tricky, but not impossible. As you can see here, I haven't written a good disassembler yet, which would detect these sequences as $DOCON. IMHO this is still a lesser evil than introducing new fields or flags into the word definition header.

P.S. Defining constants is a classical example of using DOES>, but a bit silly when applied to Jonesforth, where it's an intrinsic. It's an intrinsic so that certain compile-time constants, known only at assembler time, can be exposed to the Forth prelude and beyond. The other classical example of DOES> is struct-like definitions.

P.P.S. You might be wondering how I'm SEEing into code words, as neither Jonesforth nor pijFORTHos support it. I guess I'll blog about that next real soon whenever... The ( CODEWORD XXX ) business here shows the "code word" pointed to by the CFA, which is necessarily not DOCOL (otherwise it would be a regular colon definition, not CODE). The ( CODEWORD word+offset ) notation tells you that the machine words pointed to by the CFA are part of a different word. Native (jonesforth.s-defined) intrinsics would decompile as something like:
CODE 2SWAP ( CODEWORD 85BC ) (END-CODE)

Sunday, July 5, 2015

Implementing ;CODE in AArch32 Jonesforth for real

The Jonesforth ;CODE definition is unfortunately little more than a curiosity. After all, if you wanted to write a native machine word, you'd probably follow along and implement it inside jonesforth.s proper using the defcode macro. The real power of ;CODE would be in coupling with the CREATE word, letting you have words that define other words.

I.e. we want to be able to do something like:
    defword "$DOCON",F_IMM,ASMDOCON
        .int LIT            @ r0 points to DFA
        ldr r12, [r0, #4]   @ read cell from DFA
        .int COMMA
        .int LIT
        PUSHDSP r12         @ push to stack
        .int COMMA
        .int EXIT
    
    : MKCON
       WORD CREATE
       0 ,        ( push dummy codeword, rewritten by (;CODE) )
       ,          ( actual constant )
    ;CODE
       $DOCON
    END-CODE
    
    5 MKCON CON5  ( create word CON5 that will push 5 on stack )
    CON5 . CR     ( prints 5 )
So ;CODE is the variant to be used with CREATE, while the plain ol' make-me-a-native-word variant is called CODE. And both get to be matched with END-CODE, not semicolon. At least according to F83 or something. We're not trying to stick to any Forth standard, but the definitions have to be useful...right? So the ;CODE business now looks a bit different:
    \ This used to look like : FOO ;CODE
    CODE FOO CODE-END

    @ push r0 to stack
    defword "$<R0",F_IMM,ASMFROMR0
        .int LIT
        PUSHDSP r0
        .int COMMA
        .int EXIT
    
    @ push r7 to stack
    defword "$<R7",F_IMM,ASMFROMR7
        .int LIT
        PUSHDSP r7
        .int COMMA
        .int EXIT
    
    @ pop stack to r0
    defword "$>R0",F_IMM,ASMTOR0
        .int LIT
        POPDSP r0
        .int COMMA
        .int EXIT
    @ pop stack to r7
    defword "$>R7",F_IMM,ASMTOR7
        .int LIT
        POPDSP r7
        .int COMMA
        .int EXIT
    
    CODE SWAP $>R0 $>R7 $<R0 $<R7 END-CODE
    
    HEX 1337 FOOF SWAP . . ( prints 1337 FOOF )
So now for the actual definitions. It /looks/ pretty tame...but it took me a week to wrap my mind around it.
: (;CODE) R> LATEST @ >CFA ! ;
: ;CODE IMMEDIATE ' (;CODE) , ;
: (CODE) HERE @ LATEST @ >CFA ! ;
: CODE : (CODE) ;
: (END-CODE-INT) LATEST @ HIDDEN [COMPILE] [ ;
: (END-CODE) IMMEDIATE (END-CODE-INT) ;
: END-CODE IMMEDIATE [COMPILE] $NEXT (END-CODE-INT) ;
HIDE (END-CODE-INT)
HIDE (CODE)
Most interesting here is the behavior of ;CODE. Let's examine the example I gave first. It's an IMMEDIATE word that will compile (;CODE) into MKCON, followed by the machine code placed by generators like $NEXT or $DOCON. When MKCON is executed, it will then update the CFA of CON5 to point to the machine words inside MKCON that followed (;CODE), instead of DOCOL. The address of machine words of course is on the return stack since it's the first word following (;CODE). Aaaaand because we pop the return address, we end up EXITing not to MKCON from (;CODE) but to its caller, thereby not crashing on the crazy machine code placed by $DOCON.

Fun. Hope that made sense. I had to meditate for a while over Brad Rodriguez' Moving Forth 3 (http://www.bradrodriguez.com/papers/moving3.htm) article before it made any sense to me. But like all ingenious beautiful things, it ends up being dead simple.

Implementing ;CODE in AArch32 Jonesforth

So I got a new Raspberry Pi and me being me got sidetracked playing with a toy Forth implementation, pijFORTHos (https://github.com/organix/pijFORTHos),which is a standalone AArch32 port of Jonesforth (https://rwmj.wordpress.com/2010/08/07/jonesforth-git-repository/), which is/was an IA32-only affair. Of course I've always been amused by the idea of writing a kernel in Forth...so why not? Sadly, I probably won't do much with this...

Anyway. To cut to the chase, pijFORTHos was missing the ;CODE functionality from Jonesforth 47, which let you define native machine words in Forth... i.e. an assembler, basically. A couple of completely empty and useless examples that do nothing (and yet not crash) would look like:
: FOO ;CODE
: BAR $NEXT ;CODE
The later is redundant, since $NEXT is already emitted by ;CODE. The implementation is straighforward.. Although I took the liberty of sticking it into jonesforth.s instead of the Forth prelude, and in the actual commit I'm a bit smarter about defining $NEXT and the actual _NEXT/NEXT bits used by the Forth core itself. You wonder why bother emitting the NEXT bits inline instead of branching, but the later would take up 3 cells as well (ldr, bx and immed for ldr) and also involve a branch. Look at how the $NEXT word is defined. Isn't this crazy? It's an IMMEDIATE word that writes literals, which just happen to be machine code, at HERE, effectively compiling them into the current word definition when used in compiler mode (such as a colon definition).
@
@ $NEXT ( -- ) emits the _NEXT body at HERE, to be used
@ in ;CODE or ;CODE-defined words.
@
defword "$NEXT",F_IMM,ASMNEXT
       .int LIT
        ldr r0, [FIP], #4
       .int COMMA
       .int LIT
        ldr r1, [r0]
       .int COMMA
       .int LIT
        bx r1
       .int COMMA
       .int EXIT
@
@ Finishes a machine code colon definition in Forth, as
@ a really basic assembler.
@
defword ";CODE",F_IMM,SEMICODE
       .int ASMNEXT                      @ end the word with NEXT macro
       .int LATEST, FETCH, DUP           @ LATEST points to the compiled word
       .int HIDDEN                       @ unhide the compiled word
       .int DUP, TDFA, SWAP, TCFA, STORE @ set codeword to data instead of DOCOL
       .int LBRAC                        @ just like ";" exit compile mode
       .int EXIT

Sunday, December 7, 2014

iQUIK supports the Performa 6400

After fixing some sad bugs from the last refactoring binge and adding support for OF 2.0, the 6400 (and likely all "Alchemy"-based Macs) can be booted via iQUIK.

Just like OpenFirmware 2.0.1, 2.0 seems to suffer from the "shallow setprop" bug, that results in bogus values for the /chosen/linux,initrd-start and /chosen/linux,initrd-end properties.


Friday, November 28, 2014

Using the Nexus 9 secure agent for debug logging

#!/usr/bin/python

import fileinput, re, sys

#
# It turns out the "Trusty Secure OS" Crippleware on the Nexus 9 is
# good for least something. It is thankfully pretty chatty, meaning
# you can use it for logging from code where it's inconvenient
# or impossible to write to the UART directly, like MMU bringup code ;-).
#
# A sequence like:
#   mov x0, #'V'
#   smc #0xFFFF
#
# ...will result in the following getting emitted. I am guessing x1...x3
# get printed here as param0..2 but I am too lazy to check.
#
# smc_undefined:67: Undefined monitor call!
# smc_undefined:67: SMC: 0x56 (Stdcall entity 0 function 0x56)
# smc_undefined:67: param0: 0xf77c2e69
# smc_undefined:67: param1: 0xf77c2e68
# smc_undefined:67: param2: 0x0
#
# Now you can do basic logging to debug early bring-up. The following
# Python will turn your giant Minicom capture into something more
# sensible.
#

def process(line):
    m = re.match('\s*smc_undefined:67: SMC: (0x[0-9a-f]+)', line)
    if m:
        sys.stdout.write(chr(int(m.groups()[0], 16)))

for line in fileinput.input():
    process(line)

print("\n");

Sunday, November 23, 2014

64-bit ARM OS/Kernel/Systems Development Demo on a Nexus 9

64-bit ARM OS/Kernel/Systems Development on a Nexus 9

The Nexus 9 is based on a 64-bit nVidia K1 chip. At the moment it is the most affordable (price wise) and accessible (unit-wise) platform for exploring OS work on an AArch64 platform. The Nexus 9 allows performing an unlock via "fastboot oem unlock", allowing custom Android images to be booted.
https://github.com/andreiw/nexus9_demo

What this is

This is a small demo, demonstrating how to build and boot arbitrary code on your Nexus 9 and do some basic I/O. The demo demonstrates serial I/O and draws two black diagonal lines on the framebuffer.

What you need - required

What you need - optional

How it works

HBOOT, the Nexus bootloader, expects images to be in a certain format. The booted kernel/code must:
  • Be 64-bit
  • Be binary (not ELF)
  • Be linked at 0x80080000
  • Be compressed using "gzip"
  • Be followed by the binary FDT
  • Be contained in an "ANDROID!" boot image.

Some notes:

  • The link address appears to be hardcoded in HBOOT. The Android boot image bases and the AArch64 kernel header fields appear to be ignored.
  • The boot image can contain an additional ramdisk/initrd/payload.
  • The FDT is patched by HBOOT to contain correct linux,initrd-start and linux,initrd-end addresses.

How to build

$ CROSS_COMPILE=aarch64-linux-gnu- make

How to boot

Connect your Android tablet via a USB cable. Optionally connect the UART headphone jack adapter to your computer. The settings are 115200 8-n-1.
$ adb reboot-bootloader
$ fastboot boot nexus9_demo

Actual output of the demo

Hello!
CurrentEL = 0000000000000001
SCTLR_EL1 = 0000000010C5083A
Bye!

Where to go from here

"nexus9_dts" is the decompiled "nexus9_dtb". "nexus9_dtb" was extracted from the Android boot.img.

Final thoughts

From studying the Tegra K1 TRM, the K1 should have virtualization support (i.e. EL2). However, the HTC firmware does not allow booting an EL2-enabled OS. All kernels are booted in EL1. This is rather unfortunate and prevents playing around with KVM and Xen on this platform. Perhaps there are some problems with EL2 support. Or perhaps HTC/nVidia/Google were too myopic to allow EL2 access. It's unclear if the "oem unlock" allows reflashing custom unsigned firmware. "nvtboot" seems to enforce signed "Trusted OS" payloads, at least from dumping the strings. The boot flow looks something like this:
  • "nvtboot" (32-bit) runs on the AVP/COP.
  • "nvtboot" loads "tos" (64-bit) (Trusty aka Secure OS) on the AArch64 chip.
  • "tos" loads HBOOT (32-bit).
  • HBOOT loads Android and implements the fastboot protocol.
It's unclear how to enter NVFlash/APX mode, or how helpful that would be.

Wednesday, June 25, 2014

Solaris/PPC

Apparently back around 2006 there was an effort at Sun Labs to get OpenSolaris to work on CHRP(like)  PowerPC machines. And according to the documentation, the kernel could even boot to shell on a G4 Apple.

That effort was called Polaris. It was difficult to find the CDDL-licensed sources, but I've made them available for everyone else to play with at https://github.com/andreiw/polaris

I haven't tried it out or done anything with the sources yet. The Solaris kernel is a pretty amazing piece of software, and a very portable and well-designed one to boot. I am glad Sun open-sourced it before folding, as it's code like this that should be influencing OS R&D for generations to come. It would be interesting to see the Polaris code being used as a base for an AArch64 investigation...

A

Tuesday, June 24, 2014

What's special about...


addi r0,r1,0x138
ori r0,r0,0x60

...and I/O port 0x92 :-)?

Sunday, June 22, 2014

iQUIK update

I now have a 1.5Ghz PowerBook 12" in my possession to test iQUIK with. This is of course a NewWorld, and not a primary target for the iQUIK boot loader...

Couple of observations to be made:
  • OpenFirmware 3.0 doesn't support partition zero booting (i.e. hd:0 or CHRP-spec hd:%BOOT). This means that iQUIK cannot be booted the same way as it boots on OldWorlds,  but neither is it required. iQUIK can be booted on NewWorlds the same way as Yaboot, i.e. placing 'iquik.elf' on an HFS+ partition and blessing it. 
  • NewWorld OF requires appending ":0" for full-disk access to disk devices
I've also fixed a bug inside partition code that truncated offsets to 32 bits, and improved device path handling and parsing.

In short, though, it works. And it works quite well. So iQUIK now works on OldWorld and NewWorld machines. Yaboot - only on NewWorlds. Of course, Yaboot also supports CHRP machines, network booting and reads all filesystems supported by the underlying OpenFirmware implementation. So there's plenty of work to reach feature parity in that regard.

A

Tuesday, June 3, 2014

Detecting 'make' environment variables change

While playing with 'iquik' and trying to add a mode to build a reduced-logging version that is smaller, I ran into an interesting question - how do I force a rebuild of everything with a clean?
#
# Example of a Makefile that detects "environment change".
#
# I.e.:
#
# andreiw-lnx:~/src/ make clean
# Cleaning
# andreiw-lnx:~/src/ make 
# Resuming build with env ""
# Building with ""
# andreiw-lnx:~/src/ make CONFIG_EXAMPLE=1
# Cleaning due to env change (was "" now "-DCONFIG_EXAMPLE")
# Cleaning
# Building with "-DCONFIG_EXAMPLE"
# andreiw-lnx:~/src/ make CONFIG_EXAMPLE=1
# Resuming build with env "-DCONFIG_EXAMPLE"
# Building with "-DCONFIG_EXAMPLE"
# andreiw-lnx:~/src
#

ENV_FILE=old_build_env
-include $(ENV_FILE)

#
# Environment definition.
#
ifeq ($(CONFIG_EXAMPLE), 1)
BUILD_FLAGS = -DCONFIG_EXAMPLE
endif
BUILD_ENV = "OLD_BUILD_FLAGS=$(BUILD_FLAGS)"

#
# Detect environment change.
#
ifneq ($(BUILD_FLAGS),$(OLD_BUILD_FLAGS))
 PRETARGET=clean_env
else
 PRETARGET=log_env
endif

all: $(PRETARGET) target

target:
 @echo Building with \"$(BUILD_FLAGS)\"

log_env:
 @echo Resuming build with env \"$(BUILD_FLAGS)\"

log_clean_env:
 @echo Cleaning due to env change \(was \"$(OLD_BUILD_FLAGS)\" now \"$(BUILD_FLAGS)\"\)

clean_env: log_clean_env clean
 @rm -f $(ENV_FILE)
 @echo $(BUILD_ENV) > $(ENV_FILE)

clean:
 @echo Cleaning

Saturday, May 31, 2014

Musings on device workarounds and attribution

I was trading war stories with some colleagues today, and remembered the time I was chasing crazy UART bugs.

So I just had to go look at my battlefields of past and reminisce...

Ever look at a random driver and wonder how convoluted weird code gets written? Then you look at the git history and see - nothing useful. No history. It was apparently all written at once, by some crazy smart engineer based on thorough and clean specs, right ;-)?

Like the serial-tegra driver, for example. Ever wonder why UART waits for a certain bit of time after switching baud rate?

I used to work on the Moto Xoom tablet - the first official Android tablet, based around the Tegra 2 SoC. Once upon a time I was investigating a bug around suspend-resume. We were seeing a kernel crash when waking the tablet up occasionally with a paired Bluetooth keyboard. The actual crash was the result of a BlueZ bug that didn't defensively treat BT HCI connect events for devices that weren't disconnected (have a gander at http://www.spinics.net/lists/linux-bluetooth/msg10690.html - yes, a  rogue Bluetooth adapter /can/ crash the system, wonderful, right?)

But why weren't the BT disconnect messages coming through?

The tablet was asleep at the time of the disconnect, and the disconnect woke it up. The Bluetooth host was connected to the CPU via a UART, and the UART needed to be resumed before the BT host could send data. UART resume, among other things, needs to set the baud rate. What was happening, is that the the hardware flow control allowed RX before the baud rate change fully propagated through the UART block. The result is that the received data was corrupted. Oops.

Knowing what was happening didn't mean I had a solution, of course. The docs were useless, and it took another fun half a week to figure out the solution :-). Too bad I can't remember what this fix was for... Probably more BT issues :).

So what point did I want to make? The Tegra HSUART driver "got rewritten" when Tegra 2/3 support was upstreamed. But it's the same code, basically, even down to the code flow and comments. You put in time, sleepless nights and life energy and you can't get basic attribution from some unknown dude at NV.

Behind every line of code is some story. Some feeling of exhilaration, success and victory. I almost made a t-shirt with the fix :-). So always attribute contributions out of solidarity with your fellow hackers. Heh.

BlueZ is a train wreck, though... There. I said it.

Friday, May 30, 2014

MkLinux

The first step to getting MkLinux to run is to get the build tools to run.

Build tools?

The OSF Open Development Environment tools. Which have been very hard to find (https://github.com/slp/mkunity/issues/1). But now you can find them and even build them - https://github.com/andreiw/ode4linux

If I ever find time I'll clean up the code so it doesn't build with a million warnings.

A

Saturday, April 12, 2014

Inline assembler stupidity

I keep getting caught by this, because this is a perfect example of the compiler doing something contrary to what you're writing.
  asm volatile (
                "ldr %0, [%1]\n\t"
                "add %0, %0, #1\n\t"
                "str %0, [%1]\n\t"
                : "=r" (tmp)
                : "r" (p)
                :
                );

Guess what this gets compiled to?
  30: f9400000  ldr x0, [x0]
  34: 91000400  add x0, x0, #0x1
  38: f9000000  str x0, [x0]

...equivalent to, of course,
  asm volatile (
                "ldr %0, [%0]\n\t"
                "add %0, %0, #1\n\t"
                "str %0, [%0]\n\t"
                : "+r" (p)
                :
                :
                );
The sort of aggressive and non-obvious optimization is crazy because if I really wanted the generated code, I'd have written the inline asm the second way with a read and write modifier. Maybe for architectures with specialized and very few registers this is a reasonable approach, but for RISCish instruction sets with large instruction files this is nuts. There should be a warning option for this nonsense.

This "correct way" is to use an earlyclobber modifier.
  asm volatile (
                "ldr %0, [%1]\n\t"
                "add %0, %0, #1\n\t"
                "str %0, [%1]\n\t"
                : "=&r" (tmp)
                : "r" (p)
                :
                );
IMO anything that needs a separate paragraph in third-party documents as "a caveat" needs to be fixed.

Speaking of which... Given that C really is a high-level assembly, why not finally standardize on inline asm?