Saturday, May 31, 2014

Musings on device workarounds and attribution

I was trading war stories with some colleagues today, and remembered the time I was chasing crazy UART bugs.

So I just had to go look at my battlefields of past and reminisce...

Ever look at a random driver and wonder how convoluted weird code gets written? Then you look at the git history and see - nothing useful. No history. It was apparently all written at once, by some crazy smart engineer based on thorough and clean specs, right ;-)?

Like the serial-tegra driver, for example. Ever wonder why UART waits for a certain bit of time after switching baud rate?

I used to work on the Moto Xoom tablet - the first official Android tablet, based around the Tegra 2 SoC. Once upon a time I was investigating a bug around suspend-resume. We were seeing a kernel crash when waking the tablet up occasionally with a paired Bluetooth keyboard. The actual crash was the result of a BlueZ bug that didn't defensively treat BT HCI connect events for devices that weren't disconnected (have a gander at http://www.spinics.net/lists/linux-bluetooth/msg10690.html - yes, a  rogue Bluetooth adapter /can/ crash the system, wonderful, right?)

But why weren't the BT disconnect messages coming through?

The tablet was asleep at the time of the disconnect, and the disconnect woke it up. The Bluetooth host was connected to the CPU via a UART, and the UART needed to be resumed before the BT host could send data. UART resume, among other things, needs to set the baud rate. What was happening, is that the the hardware flow control allowed RX before the baud rate change fully propagated through the UART block. The result is that the received data was corrupted. Oops.

Knowing what was happening didn't mean I had a solution, of course. The docs were useless, and it took another fun half a week to figure out the solution :-). Too bad I can't remember what this fix was for... Probably more BT issues :).

So what point did I want to make? The Tegra HSUART driver "got rewritten" when Tegra 2/3 support was upstreamed. But it's the same code, basically, even down to the code flow and comments. You put in time, sleepless nights and life energy and you can't get basic attribution from some unknown dude at NV.

Behind every line of code is some story. Some feeling of exhilaration, success and victory. I almost made a t-shirt with the fix :-). So always attribute contributions out of solidarity with your fellow hackers. Heh.

BlueZ is a train wreck, though... There. I said it.

Friday, May 30, 2014

MkLinux

The first step to getting MkLinux to run is to get the build tools to run.

Build tools?

The OSF Open Development Environment tools. Which have been very hard to find (https://github.com/slp/mkunity/issues/1). But now you can find them and even build them - https://github.com/andreiw/ode4linux

If I ever find time I'll clean up the code so it doesn't build with a million warnings.

A