SYS$OUTPUT

Wednesday, December 1, 2010

LED triggers redux

As I found out today, profile_event_unregister and friends aren't supposed to be used in generic drivers, and are meant only for profiler usage.

Right now I am looking at reimplementing the trigger to instead create an additional kobject under the LED one, containing an additional brightness attribute. Writing to the attribute will decrease the refcount, and I would guess the refcount to remain above zero for as long as I have the sysfs fd associated with the attribute open. Then I can leverage the kobject release to handle the cleanup and turn the LED off. Looks good in my mind at 4 AM :-).

Edit: wrong again. I completely misunderstood the relationship between kobjects and sysfs. Sysfs manipulations do not have any effect on kobject lifetime. So the refcount doesn't change while store/show are executing. There is no need for that. The sysfs attributes are created in response to kobject creation, so it has no need to manipulate refcounts - you're guaranteed the kobject exists. Sysfs cleanup is performed when the kobject is destroyed, and that will wait until all outstanding sysfs operations are completed. As far as the LED trigger goes, since my original problem ties LED status to a separate hardware device, the seemingly right solution is to have a trigger that lets arbitrary kernel clients manipulate the LEDs they are interested in. I'll have code up soon :-).

Anyway, it's always exciting to see a better way of implementing something than you could before. It means you've grown a bit. For the past couple of months I've been trying to use any free time on various pet projects to explore different Linux kernel subsystems...and it's been a ride so far. It's the one thing I regret not having done enough as an NT and Hyper-V dev (exploring these systems, respectively, not Linux ;)), but that was mostly because I lacked the time, not the initiative.

I am starting to pine for EFI again. I should look for that hard drive with my 32-bit Bochs port. For old-times sake. Or maybe port EDKII to my ARM environment...

Saturday, November 20, 2010

Turning off LEDs on process crashes...

Edit: As I found out today, profile_event_unregister and friends aren't supposed to be used in generic drivers, and are meant only for profiler usage. So if you use this code, there will be kittens getting hurt someplace, and people will laugh at your kernel patches. Or something like that. YMMV.

Sometimes you want to manipulate LEDs from a program. This is easy. LEDs live in /sys/class/leds, and all you need to do is set the brightness sysfs property. Unfortunately, if the task that manipulated the LED died before turning it off, you have no automatic way of cleaning up after yourself. This is why people like manipulating drivers via an file descriptor - if anything goes wrong, the close() will happen automatically.

But the LED interface happened and no one is going to change it. The solution, of course, is to implement a LED trigger. But where LED triggers usually turn LEDs on, this one will turn it off. And it will turn it off when the task that set the trigger completes its execution. I generalized this a bit so any arbitrary task can be watched, and so that any brightness can be set on exit (because I'm nice nice like, and it didn't cost me anything).

Usage is something like this (from within your program).

# echo "owner" > /sys/class/leds/XXX/trigger
# echo "1" > /sys/class/leds/XXX/brightness
# ...do stuff...
# echo "0" > /sys/class/leds/XXX/brightness
# echo "none" > /sys/class/leds/XXX/trigger

Implementation-wise, the driver registers a PROFILE_TASK_EXIT notifier. The notifier is global, i.e. it's not tied to any specific process, so it will be invoked for every process exiting (but only as long as the trigger is in actual use), thus the need to compare PIDs. It would be nice to get a targeted PROFILE_TASK_EXIT...

FYI...

That repo patch I wrote about, that let's you continue syncing git projects even if some of them fail, is now merged in by Google. Enjoy :-)!

ARMv7 kernel with L1 cache disabled.

I was (well, still am) hunting down some memory corruptions inside our kernel, and figured removing as many of possible culprits would be a good idea. Given the different PL310 cache controller errata I figured I might as well disable this guy and see if that helps stability somewhat. Doing that is as easy as disabling CONFIG_CACHE_L2X0. Even though I obviously wasn't going to play with disabling L1 (after all, if that's where your problems are, you have bigger issues...), once I saw the CONFIG_CPU_ICACHE_DISABLE/CONFIG_CPU_DCACHE_DISABLE options I knew I had to try them out. Even if just to see our ARM target crawl.

Of course, after building with that I booted to a hard hang. I tried with just CONFIG_CPU_ICACHE_DISABLE, which worked (glacially), so it was disabling the d-cache that was hosing me. I wasn't going to let a measly kernel config option defeat me, so there went my Friday night :-)... It took me a while to figure out it was actually hanging inside printk(). On a spin_lock. Locking and atomic operations are implemented on Linux with the LDREX/STREX instructions on ARMv6 and above. If you look at the description of these instructions, they involve an exclusive monitor, which is part of the Data Cache Unit (DCU) for L1, and my L2 is off (not that it would do me any good - PL310 doesn't contain an exclusive monitor). So the STREX always fails, and the lock appears taken. Of course, spinlocks are only used with SMP, and SMP is only supported in Linux on ARMv6 and above (which added support for STREX and LDREX), so since I didn't feel like implementing raw_spin_lock with the SWP instruction (deprecated on >= ARMv6), disabling SMP was pretty much the obvious choice at 1 AM. After that I needed to enable pre-ARMv6 variants for mutexes, locks, atomic operations, bit operations and __xchg/cmpxchg. And now it boots.... Of course my user space, being compiled for ARMv7, expects functional LDREX/STREX, and so it hangs there...in the init process.

Email notifications.

Frequently, you may start some long task, like a build or a SCM sync, that might take some time to finish, and you might not want to hang around to see it finish or fail. Additionally, if it does fail, you might wish to contact someone via email.

So I just had to put together a tool, that would let me run an arbitrary command with arbitrary parameters, and pipe stdout/stderr/return status into an email. The tool handles locale/encoding correctly, tails a configurable amount of interspersed stdout/stderr output in the body of the email in a fixed size font, and lets you attach both stdout and stderr in full as separate attachments. It also provides run times for the run task. Uses SMTP via MTA/MSA, and supports TLS+auth. Unless requested, still prints to console all stdout/stderr.

Not the prettiest or best code ever, as I didn't have time, and I'm not really a Python person. It only depends on markup.py, because I was lazy and didn't want to deal with generating HTML.

< 2.6.36 and ioctl

I didn't realize all ioctl()s were handled with the BKL, unless of course a driver used the new unlocked_ioctl way of doing things. Of course with 2.6.36, you don't have a choice any longer =).

Friday, July 30, 2010

Forcing repo to continue syncing projects even if an individual project sync fails...

In case anyone got frustrated at repo bailing at git errors for a particular project...here you go! Hopefully this gets merged in.