Saturday, November 20, 2010

Turning off LEDs on process crashes...

Edit: As I found out today, profile_event_unregister and friends aren't supposed to be used in generic drivers, and are meant only for profiler usage. So if you use this code, there will be kittens getting hurt someplace, and people will laugh at your kernel patches. Or something like that. YMMV.

Sometimes you want to manipulate LEDs from a program. This is easy. LEDs live in /sys/class/leds, and all you need to do is set the brightness sysfs property. Unfortunately, if the task that manipulated the LED died before turning it off, you have no automatic way of cleaning up after yourself. This is why people like manipulating drivers via an file descriptor - if anything goes wrong, the close() will happen automatically.

But the LED interface happened and no one is going to change it. The solution, of course, is to implement a LED trigger. But where LED triggers usually turn LEDs on, this one will turn it off. And it will turn it off when the task that set the trigger completes its execution. I generalized this a bit so any arbitrary task can be watched, and so that any brightness can be set on exit (because I'm nice nice like, and it didn't cost me anything).

Usage is something like this (from within your program).

# echo "owner" > /sys/class/leds/XXX/trigger
# echo  "1" > /sys/class/leds/XXX/brightness
# stuff...
# echo "0" > /sys/class/leds/XXX/brightness
# echo "none" >  /sys/class/leds/XXX/trigger

Implementation-wise, the driver registers a PROFILE_TASK_EXIT notifier. The notifier is global, i.e. it's not tied to any specific process, so it will be invoked for every process exiting (but only as long as the trigger is in actual use), thus the need to compare PIDs. It would be nice to get a targeted PROFILE_TASK_EXIT...


That repo patch I wrote about, that let's you continue syncing git projects even if some of them fail, is now merged in by Google. Enjoy :-)!

ARMv7 kernel with L1 cache disabled.

I was (well, still am) hunting down some memory corruptions inside our kernel, and figured removing as many of possible culprits would be a good idea. Given the different PL310 cache controller errata I figured I might as well disable this guy and see if that helps stability somewhat. Doing that is as easy as disabling CONFIG_CACHE_L2X0. Even though I obviously wasn't going to play with disabling L1 (after all, if that's where your problems are, you have bigger issues...), once I saw the CONFIG_CPU_ICACHE_DISABLE/CONFIG_CPU_DCACHE_DISABLE options I knew I had to try them out. Even if just to see our ARM target crawl.

Of course, after building with that I booted to a hard hang. I tried with just CONFIG_CPU_ICACHE_DISABLE, which worked (glacially), so it was disabling the d-cache that was hosing me. I wasn't going to let a measly kernel config option defeat me, so there went my Friday night :-)... It took me a while to figure out it was actually hanging inside printk(). On a spin_lock. Locking and atomic operations are implemented on Linux with the LDREX/STREX instructions on ARMv6 and above. If you look at the description of these instructions, they involve an exclusive monitor, which is part of the Data Cache Unit (DCU) for L1, and my L2 is off (not that it would do me any good - PL310 doesn't contain an exclusive monitor). So the STREX always fails, and the lock appears taken.  Of course, spinlocks are only used with SMP, and SMP is only supported in Linux on ARMv6 and above (which added support for STREX and LDREX), so since I didn't feel like implementing raw_spin_lock with the SWP instruction (deprecated on >= ARMv6),  disabling SMP was pretty much the obvious choice at 1 AM. After that I needed to enable pre-ARMv6 variants for mutexes, locks, atomic operations, bit operations and __xchg/cmpxchg. And now it boots.... Of course my user space, being compiled for ARMv7, expects functional LDREX/STREX, and so it hangs the init process.