SYS$OUTPUT: 2016

64-bit ARM OS/Kernel/Systems Development on an nVidia Shield TV (Tegra X1)

The Shield TV is based on the 64-bit nVidia X1 chip. Unlike the K1, this is actually a Cortex-A57 based design, instead of being based on the nVidia "Denver" design. That by itself is kind of interesting already. The Shield TV was available much much earlier than the X1-based nVidia development board (Jetson TX1, you can even buy it on Amazon), and costs about a third of the TX1. The Shield TV allows performing an unlock via "fastboot oem unlock", allowing custom OS images to be booted. Unlike the TX1, you don't get a UART (and I haven't found the UART pads yet, either).

What this is

https://github.com/andreiw/shieldTV_demo

This is a small demo, demonstrating how to build and boot arbitrary code on your Tegra Shield TV. Unlike the previous Tegra K1 demo, you get EL2 (hypervisor mode!).

A Shield TV, unlocked. Search Youtube for walkthroughs.
Shield OS version >= 1.3.
GNU Make.
An AArch64 GNU toolchain.
ADB/Fastboot tools.
Bootimg tools (https://github.com/pbatard/bootimg-tools), built and somewhere in your path.
An HDMI-capable screen. Note, HDMI, not DVI-HDMI adapter. You want the firmware to configure the screen into 1920x1080 mode, otherwise you'll be in 640x480 and we don't want that...

How to build

$ CROSS_COMPILE=aarch64-linux-gnu- make

...should yield 'shieldTV_demo'.

How to boot

Connect the Shield TV a USB cable to your dev workstation.
Reboot device via:
```
$ adb reboot-bootloader
```
...you should now see the nVidia splash screen, followed by the boot menu.
If OS is 1.3, you can simply:
```
$ fastboot boot shieldTV_demo
```
If OS is 1.4 or 2.1, you will need to:
```
$ fastboot flash recovery shieldTV_demo
```
...and then boot the "recovery kernel" by following instructions on screen.

The code will now start. You will see text and some drawn diagonal lines black background. The text should say we're at EL2 and the lines should be green. The drawing will be slow - the MMU is off and the caches are thus disabled.

Let me know if it's interesting to see the MMU setup code.

Final thoughts

The Shield TV is a better deal than the TX1 for the average hobbyist, even with the missing UART. For the price being sold the TX1 should come with a decent amount of RAM, not 1GB more than the Shield TV. nVidia...are you listening? Uncripple your firmware so booting custom images is not a song-and-dance (you broke it in 1.4!) and at least TELL us where the UART pads are on the motherboard. If you're really cool put together an "official" Ubuntu image that runs on the TX1 and the Shield (and fix SCR_EL3.HCE, too).

"UEFI" on...?

This article is the first in a series of posts touching on the general process of bringing up TianoCore EDK2 on an otherwise unsupported architecture. Maybe you want to support UEFI for your CPU architecture, or simply have a reasonable firmware environment. In either case, because UEFI is not actually defined for your architecture, you're going to have to do a bit more work than your typical platform bring-up. By the time you're done, you could become a perfect addition to the UEFI forum and its specification committees...yeah!

This blog post and the ones following it continually refer to the ongoing PPC64LE Tiano port I am working on, available at https://github.com/andreiw/ppc64le-edk2/. Since everyone can read the fine code, this document mostly highlights the various steps performed throughout the commits. The git repo isn't perfect, though. Some changes ended up evolving over a few commits while I ironed things out and brought in more code. Hopefully I don't miss mentioning anything important.

TianoCore

Tiano is Intel's open-source (BSD-licensed) implementation of the UEFI specification. EDK2 is the second and current iteration of the implementation.

UEFI officially is supported on IA32, X64, IPF, ARM and AARCH64 architectures, the EDK2 has CPU support code for the x86 and ARM variants. There's a MIPS EDK1 port floating about, and now two PPC64LE OpenPower EDK2 ports, one of which ended up fueling this article...

I'm not going to focus on the architecture of either UEFI or Tiano. Good books have been written on the subject. Here's some tl;dr material, though, for the hopelessly impatient:

At least read the User Documentation and glance at the boot flow diagram. You should be now able to fetch, build and boot Tiano Core using the emulation package and have a rough understanding of what it takes to get a build going via Conf/target.txt and Conf/tools_def.txt.

Your Target

Your target is a 32-bit or 64-bit little-endian chip. I suppose big-endian is doable, but none of the Tiano code is endian safe and UEFI is strictly little-endian.

Development Environment

This assumes that you are doing development on Linux and are using and ELF toolchain and GCC compilers. You are going to need:

A compiler. https://www.kernel.org/pub/tools/crosstool/ is a good source.
A simulator, virtual machine or target with some monitor environment to upload and run code from RAM.

Basic Project Setup

Pick a short identifier for your architecture. Pick a name that's unused - for the Power8 port I picked PPC64. This tag will be used with the Tiano build scripts. The next step is to create a couple of Pkg directories that will contain our stuff. In my PPC64 port I initially went with a single-package solution, but I should be working on splitting it up into the more conventional layout, where platform-independent portions are in PPC64Pkg and platform-dependent parts (including build scripts for building for actual boards) are in PPC64PlatformPkg. You can refer to this commit as an example for the minimum required to build a dummy EFI application that does nothing.

If our architecture was supported, then running a similar build command for your package would succeed.

build -p YourArchPlatformPkg/YourArchPlatformPkg.dsc

Build Infrastructure

The next step is to enable the build infrastructure and scripts to understand your architecture identifier. Here's a list of files I had to modify - this is all stuff under BaseTools/Source/Python and the changes are incredibly mechanical.

Common/DataType.py
Common/EdkIIWorkspaceBuild.py
Common/FdfParserLite.py
Common/MigrationUtilities.py
CommonDataClass/CommonClass.py
CommonDataClass/ModuleClass.py
CommonDataClass/PlatformClass.py
GenFds/FdfParser.py
GenFds/FfsInfStatement.py
GenFds/GenFds.py
TargetTool/TargetTool.py
build/build.py

See https://github.com/andreiw/ppc64le-edk2/commit/2fef3116f0c732036588af12cd409f8f3d96436b for my PPC64 implementation.

Build tools

UEFI executables are PE/COFF files. Since we are building on Linux, EDK2 uses a workflow where ELF artifacts produced by the cross-compiler are converted into PE32/PE32+ files with the GenFw tool. The PE/COFF artifacts are then wrapped into an FFS object and assembled into what is known as an FV ("firmware volume"). Multiple FVs are put into an FD. The FV is really a flat file system that uses GUIDs for everything and can store other types of objects as well. You could also generate what is known as a TE (terse executable), but it's basically a cut down version of COFF FWIW.

Tiano deals with several kinds of executables. The UEFI runtime (DXE core, UEFI drivers, and so on) are relocated as they are loaded, while the code that runs prior to the UEFI runtime itself is XIP (execute-in-place) and is thus pre-relocated to fixed addresses by the tool constructing the FV. The point behind this being that such pre-UEFI code (which is the SEC and PEI phases for Tiano), is run in an environment before the DRAM is available.

Thus we need to enable GenFw to create PE/COFF executables from ELF for our architecture. This ties into the compiler options we're going to use, which is not something we've addressed yet. It helps to understand that a PE/COFF file is basically a position-independent executable. Although there is "preferred linking address" that all symbols are relocated against, the PE/COFF image contains a sufficient amount of relocation information, known as "base relocations", to allow loading at any address. So it would appear, that the easiest approach is to generate a position-independent ET_DYN ELF executable (with the -pie flag to the linker) and then focus on converting the output to COFF. This is the approach I highly suggest adopting. You will only have to deal with a single relocation type (R_PPC64_RELATIVE in my case) that will map naturally to either the 64-bit or the 32-bit COFF base relocation type, depending on the bit width of your architecture.

Other approaches are possible, such as embedding all relocations with the --emit-relocs linker flag and dealing with the entire soup of crazy relocs later, but the success of this approach is highly dependent on the architecture and ABI. It may be impossible to convert to PE/COFF due to a mismatch between ELF and COFF base relocs and ABI issues. When first working on the PPC64 port, I first followed the AArch64 approach that did just this, and ended up being forced to use the older (and not really meant for LE) ELFv1 ABI. Don't do it.

Note that depending on the ABI, you may have to do a bit of tool work. I am guessing this was the reason why the AArch64 port never adopted using PIE ELF binaries. Fortunately, you should be able to follow along my changes to Elf64Convert.c. You may also have to make changes to the base linker script used with the GNU toolchains.

Don't forget that you will need to manually rebuild the BaseTools if you make any changes!

make -C BaseTools

At this point we can go back and figure out the compile options. This is the BaseTools/Conf/tools_def.template file, that is then copied to Conf/ by edksetup.sh on freshly checked-out trees. The compiler options heavily depend on your architecture, of course, but generally speaking:

build on top of definitions made for new architectures like AArch64, because there's simply less of them in this file to wrap your mind around
consider the PPC64 definitions in my tree
-pie, unless position-independent executables don't work for you for some reason
large model
soft float (you can always move to hard float later if that rocks your boat, but it's just more CPU state to wrap your head around)
PECOFF_HEADER_SIZE=0x228 for 64-bit chips, 0x220 for 32-bit ones

This is the point where trying to build again should start giving you compile errors, because we still haven't modified any of the Tiano include and library files to be aware of the new architecture.

To be continued.

SYS$OUTPUT

Wednesday, June 8, 2016

Disassembling NT system files

Sunday, May 8, 2016

Easy creation of proxy DLL pragmas

Converting dumpbin DLL exports information to MSVC linker pragmas

64-bit ARM OS/Kernel/Systems Development Demo on an nVidia Shield TV (Tegra X1)