Given that TianoCore is open source and now supporting the ARM architecture, it may be a given that someone may wish to play around with it on their device of choice. Most likely, it's an Android phone, or perhaps some other Linux-using device. That also might imply that you don't have access to boot-loader sources, or the bootloader is "secure", or you simply don't wish to have the possibility of bricking the device totally... It almost likely implies that you might know which SOC chipset is inside, but have no idea for all the power regulators, display controllers, and other logic in the device, so nuking the firmware is definitively not the right option. Better to keep it around so it initializes all the hardware enough that you can interact with it purely by manipulating SoC devices...
So what do you do?
TianoCore UEFI firmware consists of three phases of execution. The SEC is the little bit of bootstrap code that runs at cold (or warm) boot, and does just enough to jump to the second stage, the PEI, which configures the DRAM and jumps to the DXE phase. DXE is UEFI proper and loads and dispatches UEFI drivers and applications... We want to chain load UEFI from our existing bootloader. The existing bootloader can load Linux kernels. Our UEFI bootstrap can skip directly to DXE, since the hardware is sufficiently initialized that we can access our RAM.
That was a really quick description of UEFI boot, too. Please see here, here and here for more info.
Basically the goal is to make our DXE IPL look like a Linux kernel. The flash image containing our boot firmware volume will for all purposes be treated like an initrd. I'll omit certain things that you get at by looking at the various EDK2 platform packages, such as the BeagleBoardPkg and everything under ArmPlatformPkg.
Hello, World!
The EDK2 build tools generally create PE32+ images as the output, since that's what UEFI uses for drivers and applications. If you build with GCC, the intermediate files are ELF, and the GenFw tool does all the dirty work of converting the file formats and re-base the images if necessary. The Linux kernel on ARM is a binary blob, composed of a decompressor and a compressed image. Luckily, GenFw can be asked to create a pure binary.
Add something like this to your .inf file:
[BuildOptions] *_*_*_GENFW_FLAGS = -bOf course, to be booted by a Linux kernel bootloader, we need to look like the Linux kernel. Thankfully, it's as easy as a few magic words at the start of the binary. Something like this -
.align _ModuleEntryPoint: .text .type _ModuleEntryPoint, #function .rept 8 mov r0, r0 .endr b 1f .word 0x016f2818 @ Magic numbers to help the loader .word _ModuleEntryPoint .word _end 1: mov ip, r2 @ Save ATAG pointer. ... ... ...
You can read more about Linux-ARM booting here.
With that you should be able to write a little assembly program that can build using the EDK2 tool chain and write "Hello, World!" to, say, an already configured UART.
Location.
The next problem is a bit more interesting - we don't know where we are running. Well, that's not true. We know where we are running, yet, our code has been linked to a particular address during build time, so if we execute, all of our data accesses will be garbage. We have two options:
- Relocate ourselves from wherever we are to the linked address. This involves thinking about the memory map (and knowing it at least somewhat).
- Build position independent code, and patch ourselves up.
To use a link script, add something like this to your module INF file:
[BuildOptions] *_*_*_DLINK_FLAGS == --oformat=elf32-littlearm -nostdlib -u $(IMAGE_ENTRY_POINT) -e \ $(IMAGE_ENTRY_POINT) -Map $(DEST_DIR_DEBUG)/$(BASE_NAME).map -X -T $(MODULE_DIR)/Arm/LinIpl.lds
For option (1) the following is sufficient for a linker script:
SECTIONS { . = 0x1400000; _start = .; _text = .; .text : { *(.start) *(.text) *(.text.*) *(.fixup) *(.data) *(.rodata) *(.rodata.*) *(.glue_7) *(.glue_7t) . = ALIGN(4); } . = ALIGN(4); .bss : { *(.bss) } _end = .; }... the ". = 0x1400000" specifies the link address. That number I picked more-or-less knowingly, and if you don't know what to pick it's not the best way to start out.
The actual code to perform the relocation looks something like this -
@ Load stack and relocate everything to link address. adr r0, LC0 ldmia r0, {r1, r2, r3, r4, sp} @ r2 - linked _start subs r0, r1, r0 @ r0 = linked - actual sub r0, r2, r0 @ r0 = actual _start 1: ldr r1, [r0], #4 str r1, [r2], #4 cmp r2, r3 blo 1b isb dsb mov pc, r4 reloc_done: ... ... ... .ltorg .align 2 .type LC0, #object LC0: .word LC0 @ r1 .word _start @ r2 .word _end @ r3 .word reloc_done @ r4 .word _stack_end @ sp .align .section ".bss" .align 2 _stack: .space 4096 _stack_end:For option (2) we'll use position-independent code. Note that it's not sufficient to compile just the IPL code with -fpic, but all linked modules need to be compiled in the same way too (and since we use plenty of EDK libraries, there is no going around that). I used a separate DSC file that I used specifically to build the IPL bits separately from the rest of my platform files. Add something like this to your DSC file -
[BuildOptions] # Important! Must build as PIC code. GCC:*_*_ARM_ARCHCC_FLAGS == -march=armv7-a -fpic -mthumbThe linker script looks like -
SECTIONS { . = 0; _text = .; .text : { *(.start) *(.text) *(.text.*) *(.fixup) *(.data) *(.rodata) *(.rodata.*) *(.glue_7) *(.glue_7t) . = ALIGN(4); } _got_start = .; .got : { *(.got) } _got_end = .; . = ALIGN(4); .bss : { *(.bss) } _end = .; }...this way we get access to our Global Offset Table so we can patch it up. The GOT fix-up code looks like this -
@ Load stack and fix-up GOT. adr r0, LC0 ldmia r0, {r1, r2, r3, sp} subs r0, r0, r1 @ r0 = actual - linked add r2, r2, r0 add r3, r3, r0 add sp, sp, r0 1: ldr r6, [r2], #0 add r6, r6, r0 @ actual = linked + r0 str r6, [r2], #4 cmp r2, r3 blo 1b ... ... ... .ltorg .align 2 .type LC0, #object LC0: .word LC0 @ r1 .word _got_start @ r2 .word _got_end @ r3 .word _stack_end @ sp .alignThe disadvantage of this method (2) is that you have to be careful. The GOT fix-ups will not fix up your own pointers to objects, so if you have a structure/table with function pointers (like an EFI protocol), you will need to patch it manually.
Finding the FD and RAM.
As you've seen in the link above about Linux-ARM boot process, we get passed a pointer to a list of ATAG, which contain our info about RAM and initrd location, among other things. Additionally the RAM size could be passed through the kernel command line.
Parsing ATAGs is easy -
#define ATAG_TYPE_CORE (0x54410001) #define ATAG_TYPE_MEM (0x54410002) #define ATAG_TYPE_INITRD2 (0x54420005) #define ATAG_TYPE_CMDLINE (0x54410009) #define ATAG_TYPE_NONE (0x0) typedef struct _ATAG_HEADER { UINT32 Size; UINT32 Tag; } ATAG_HEADER, *PATAG_HEADER; typedef struct _ATAG_INITRD2 { ATAG_HEADER Header; UINT32 Start; UINT32 Size; } ATAG_INITRD2, *PATAG_INITRD2; typedef struct _ATAG_MEM { ATAG_HEADER Header; UINT32 Start; UINT32 Size; } ATAG_MEM, *PATAG_MEM; typedef struct _ATAG_CMDLINE { ATAG_HEADER Header; CHAR8 CmdLine[1]; } ATAG_CMDLINE, *PATAG_CMDLINE; EFI_STATUS ATagsValid( IN PATAG_HEADER ATagsList ) { if (ATagsList->Tag == ATAG_TYPE_CORE) { return EFI_SUCCESS; } return EFI_NOT_FOUND; } PATAG_HEADER ATagsGet( IN PATAG_HEADER ATags, IN UINT32 TagType ) { // // Start at the next tag... // PATAG_HEADER Tag = ATags; Tag = (PATAG_HEADER) ((UINTN) Tag + Tag->Size * sizeof(UINT32)); while (Tag->Tag != TagType &&Tag->Tag != ATAG_TYPE_NONE) { if (Tag->Tag == TagType) { break; } Tag = (PATAG_HEADER) ((UINTN) Tag + Tag->;Size * sizeof(UINT32)); } return Tag->Tag == ATAG_TYPE_NONE ? NULL : Tag; } ... ... ... { FdAtag = (PATAG_INITRD2) ATagsGet (ATags, ATAG_TYPE_INITRD2); if (!FdAtag) { DEBUG ((EFI_D_ERROR, "No EFI FD image passed to IPL :(\n")); goto done; } DEBUG ((EFI_D_INFO, "FD @ 0x%x-0x%x\n", FdAtag->Start, FdAtag->Start + FdAtag->Size)); MemAtag = (PATAG_MEM) ATagsGet (ATags, ATAG_TYPE_MEM); if (MemAtag) { MemBase = MemAtag->Start; MemSize = MemAtag->Size; } else { MemBase = PcdGet32(PcdMemoryBase); MemSize = PcdGet32(PcdMemorySize); DEBUG ((EFI_D_ERROR, "Where'd ATAG_MEM go? Using baked-in default...\n")); } DEBUG ((EFI_D_INFO, "RAM @ 0x%x-0x%x\n", MemBase, MemBase + MemSize)); } ... ... ...You could make the above as complicated as necessary to handle non-contiguous memory ranges, etc. The bootloader also passes a machine type to the IPL, so you could even create an image that runs on multiple devices :-).
You're also responsible for setting up a page table, and mapping (linearly, of course, this is UEFI) the necessary memory and MMIO ranges with the correct attributes.
For me this is something like -
ARM_MEMORY_REGION_DESCRIPTOR * EFIAPI IplPlatformMemoryRegions ( IN UINT32 MemoryBase, IN UINT32 MemoryLength, IN ARM_MEMORY_REGION_ATTRIBUTES CacheAttributes ) { mMemoryTable[0].PhysicalBase = MemoryBase; mMemoryTable[0].VirtualBase = MemoryBase; mMemoryTable[0].Length = MemoryLength; mMemoryTable[0].Attributes = CacheAttributes; mMemoryTable[1].PhysicalBase = TEGRA_IO_SPACE_START; mMemoryTable[1].VirtualBase = TEGRA_IO_SPACE_START; mMemoryTable[1].Length = TEGRA_IO_SPACE_END - TEGRA_IO_SPACE_START; mMemoryTable[1].Attributes = ARM_MEMORY_REGION_ATTRIBUTE_DEVICE; mMemoryTable[2].PhysicalBase = 0; mMemoryTable[2].VirtualBase = 0; mMemoryTable[2].Length = 0; mMemoryTable[2].Attributes = 0; return mMemoryTable; } VOID InitCache ( IN UINT32 MemoryBase, IN UINT32 MemoryLength ) { ARM_MEMORY_REGION_ATTRIBUTES CacheAttributes; ARM_MEMORY_REGION_DESCRIPTOR *MemoryTable; VOID *TranslationTableBase; UINTN TranslationTableSize; if (FeaturePcdGet(PcdCacheEnable) == TRUE) { CacheAttributes = ARM_MEMORY_REGION_ATTRIBUTE_SECURE_WRITE_BACK; } else { CacheAttributes = ARM_MEMORY_REGION_ATTRIBUTE_SECURE_UNCACHED_UNBUFFERED; } MemoryTable = IplPlatformMemoryRegions (MemoryBase, MemoryLength, CacheAttributes); ArmConfigureMmu (MemoryTable, &TranslationTableBase, &TranslationTableSize); BuildMemoryAllocationHob((EFI_PHYSICAL_ADDRESS)(UINTN)TranslationTableBase, TranslationTableSize, EfiBootServicesData); }Note that I've marked the regions as secure accesses, whereas by default the AXI transactions will otherwise be non-secure. It may appear unimportant, but with non-secure accesses you won't be able to properly access many important system devices.
Handing Off.
Now we just need to figure out where the DXE stack will be, create a HOB (Hand-Off Block) list that will describe the CPU and memory layout, and pass control to DXE. Back in EDK1 days it ended up being a bunch of ugly code lifted verbatim from the PEI Core, but it's a lot simpler now, as PrePiLib contains pretty much everything you need.
Something like this -
// // DXE stack is below end of first chunk. // StackBase = MemBase + MemSize - PcdGet32(PcdPrePiStackSize); DEBUG ((EFI_D_INFO, "DXE Stack @ 0x%x-0x%x\n", StackBase, StackBase + PcdGet32(PcdPrePiStackSize))); // // We'll make the HOB lie after the FD. // HobBase = (VOID *) (FdAtag->tart + FdAtag->Size); CreateHobList ((VOID*) MemBase, MemSize, HobBase, (VOID *) StackBase); // // Enable CPU features, cache/MMU. // ArmEnableBranchPrediction (); InitCache (MemBase, MemSize); BuildMemoryAllocationHob(FdAtag->Start, FdAtag->Size, EfiBootServicesData); BuildFvHob(FdAtag->Start + PcdGet32(PcdFlashFvMainBase), PcdGet32(PcdFlashFvMainSize)); DEBUG ((EFI_D_INFO, "Handing-off to DXE...\n")); InitVectors(PcdGet32(PcdCpuVectorBaseAddress)); LoadDxeCoreFromFv (NULL, 0); DEBUG ((EFI_D_ERROR, "DXE core returned :(\n"));Note that above InitVectors doesn't actually set up exception vectors, it "cleans them up" replacing each entry with a "jump back to self" loop. Otherwise the CpuDxe will think the exception entry is already in use and refuse to replace it. D'Oh!
VOID InitVectors ( IN UINT32 VectorsBase ) { UINT32 *Start = (UINT32 *) VectorsBase; UINT32 *End = (UINT32 *) VectorsBase + VECTORS_COUNT; for (;Start < End; Start++) { *Start = VECTORS_B_LOOP;; } ArmDataSyncronizationBarrier (); }Odds & Ends.
You might be wondering what your FDF file should look like. To re-iterate, the IPL gets built but it doesn't get stored inside the FV/FD. After you finish building, you might wish to combine the two into an Android boot image for flashing with fastboot. The IPL is your kernel, while the FD is your Initrd :-). My FDF looks like this. There is a single FV. Remember, since don't know where the FD will be loaded by the bootloader, the base address doesn't really matter. We have no XIP code (like embedded SEC, PEI Core, PEIMs) in our FV, so we have nothing to worry about -
[FD.Tegra_EFI] # # Warning! BaseAddress is "logical". That means PcdFlashFvMainBase is an offset # from where it really is. # BaseAddress = 0x0|gEmbeddedTokenSpaceGuid.PcdEmbeddedFdBaseAddress Size = 0x00100000|gEmbeddedTokenSpaceGuid.PcdEmbeddedFdSize ErasePolarity = 1 BlockSize = 0x1 NumBlocks = 0x100000 0x00000000|0x00100000 gEmbeddedTokenSpaceGuid.PcdFlashFvMainBase|gEmbeddedTokenSpaceGuid.PcdFlashFvMainSize FV = Tegra [FV.Tegra] BlockSize = 0x1 NumBlocks = 0 FvAlignment = 8 ERASE_POLARITY = 1 MEMORY_MAPPED = TRUE STICKY_WRITE = TRUE LOCK_CAP = TRUE LOCK_STATUS = TRUE WRITE_DISABLED_CAP = TRUE WRITE_ENABLED_CAP = TRUE WRITE_STATUS = TRUE WRITE_LOCK_CAP = TRUE WRITE_LOCK_STATUS = TRUE READ_DISABLED_CAP = TRUE READ_ENABLED_CAP = TRUE READ_STATUS = TRUE READ_LOCK_CAP = TRUE READ_LOCK_STATUS = TRUE INF MdeModulePkg/Core/Dxe/DxeMain.inf ... ... ...The End.
This should get you as far as an ASSERT inside DxeMain about missing UEFI architectural protocols - some of these, like the CpuDxe, are already present in the source distribution, others you will have to write if they don't exist yet for your SoC - like the Timer and Interrupt Controller code.
Nice. Seems you have sufficient knowledge about UEFI on ARM.
ReplyDeleteI was trying to use UEFI on our ARM platform. I'm going through the theory, understanding spec part etc now. Once i dirt my hands with board and code, this blog will be pretty much useful. Thanks.