Sunday, October 21, 2012

FUSE musings.

If you're actually trying to write a real file system (not just a pass-through or a filter) then you need to keep track of files, directories, links, open reference counts, path lookup - i.e. the typical common file system baggage that in any self-respecting operating system already has a well-debugged and tuned implementation, like VFS in *nixes, IFS in NT and FSS in ESX.

For FUSE, I'm surprised that no one made a generic file system metadata implementation or library that could be consumed by things that are more complicated than sshfs and the like. I suppose I should take a look at "real" file systems like ntfs-3g and the FUSE ZFS port, although I think I already know what I'm going to find... I suppose there's definite sport in writing your own ;-).

Saturday, October 20, 2012

Linux kernel file I/O from a block driver

This code shows how a hypothetical block driver might handle submitted BIOs, servicing them with a file.
/* Handle I/O for a BIO component. */
int file_do_bvec(struct file *file, struct bio_vec *bvec, loff_t pos, int rw)
        u8 *buf;
        ssize_t bw;
        mm_segment_t old_fs = get_fs();

        buf = kmap_atomic(bvec->bv_page, KM_USER0) + bvec->bv_offset;


        if (rw == WRITE)
                bw = vfs_write(file, buf, bvec->bv_len, &pos);
                bw = vfs_read(file, buf, bvec->bv_len, &pos);

        kunmap_atomic(buf, KM_USER0);

        if (likely(bw == len))
                return 0;
        printk(KERN_ERR "Error at byte offset %llu, length %i.\n",
                        (unsigned long long)pos, len);
        if (bw >= 0)
                bw = -EIO;
        return bw;

/* Handle I/O for a BIO. */
int file_do_bio(struct file *file, struct bio *bio, loff_t pos)
        struct bio_vec *bvec;
        struct page *page = NULL;
        int i, ret = 0;

        /* Should have read-only check here, BIOs other than READ/WRITE. */

        bio_for_each_segment(bvec, bio, i) {
                ret = file_do_bvec(file, bvec, pos, bio_rw(bio));
                if (ret < 0)
                pos += bvec->bv_len;

         * At some point we need to fsync. In this simple example - I'll do it here.
         * TBD: should check error.
        vfs_fsync(file, 0);
        return ret;

PM and QA (Apple Maps)

It's not a secret that for every PM who turned to the role to exert a larger sphere of influence than being able in an individual contributor role, there are thousands that did so to simulate work and walk around meetings with laptops. In my six years I've dealt with only exactly one PM who could serve as a poster example of what this role really should be about. Like any other track in life, not everyone is cut out for the job, just like not everyone is cut out for QA, research or people management.

Looks like Apple maps got the right people under one roof :-P

I'm not going to give the usual gripe about missing locations, etc, but a perfect example of something being really wrong about the team working on the maps application. I have the fortune of using a non-English locale (Russian) on my phone, so the navigation is localized as well. The text-to-speech engine is horrible. Basically, when pronouncing foreign names (in this case, English ones) you should either stick to the original (English) phonetics, or with the foreign (Russian) equivalents where possible. Ideally, this should be a configurable option, just like showing native or foreign street names. In the case of the maps application, it's impossible to tell what the TTS engine is talking about, as the pronunciation is neither English, nor Russian.

Which is nothing until once in a while, the map application refers to miles as milliliters, which sounds about the same in Russian as it does in English. The abbreviation used is "мл." (ml.), and apparently lacking context, once in a while the TTS engine says milliliters instead of miles. But not always. Astounding. This does tell me that foreign localization was not a real deliverable, was not done by native speakers, and was not tested or dogfooded internally. Overall - great PM and QA efforts that deserve a promotion to explore other opportunities.

Thursday, October 18, 2012

Linux kernel file I/O

I had a colleague tell me today that file I/O under Linux kernel is contrived. Let's examine...

Wednesday, October 17, 2012

Hiding mount points.

Consider using a user-space file system that acts as a filter on top of a "real" file system. Your user-space driver might mount the "real" file system in the background. What happens if the user-space driver crashes? That's right. No cleanup and left-over mounts.

There's another aspect here - you want to ensure exclusive access to the "real" file system, even from the perspective of PEBCAK-type behaviors, since modifications made directly to the "real" file system could corrupt the filtered one.
So it turns out this is well possible in Linux. The sequence of operations is something like -
mount("source", "/tmp/target", "ext4", 0, "");
dir = opendir("/tmp/target"); /* open so the umount2 defers */
fd = dirfd(dir);
umount2("/tmp/target", MNT_DETACH);
rmdir("/tmp/target"); /* fine too */
/* do stuff in hidden mounted fs through fd */
closedir(dir); /* finally unmounted on close */
In fact, after the MNT_DETACH (deemed a "lazy" umount) you can well rmdir(2) the mount point away (or mount something else on it). Very useful. If you're wondering how you can perform file and directory operations without having a named path, then openat(2) and related are your friends :-).

Wednesday, October 3, 2012


It's been a rocky ride so far.

Observation #1: PMON is a POS (and that's not "point of sale terminal")
  • Why serial console only gets the firmware boot time messages?
  • Why can't I look at all those messages scrolling by into oblivion?
...actually, you could write a book about the "why can't I XXX". Why PMON? If they did a TI where the braindead firmware just ran something like "boot.elf" in the order of USB, then HD, it would be still miles better. Would u-boot really be harder to port? It would certainly be less obscure. I think some grad students ported UEFI.'s pseudo-UEFI, since UEFI doesn't technically *do* MIPS... *sigh*. There is EFI-MIPS, but, curiosly enough it's EDK-based, which makes me feel like I'm back in 2006, and is an obvious dead end for that matter alone.

Observation #2: Booting Linux via a hex input panel, if possible, would probably be simpler.
  • The numbering thankfully doesn't change from different USB ports.
  • Curiously enough, 'load' work faster than 'initrd'. I timed it against loading the same file. Expect to run to Voltage and back for a coffee before the initrd completes...
 Observation #3: Booting doesn't actually work.
  • Without initrd, I get a hang after the i8042 probe.
  • With initrd, I get the MIPS equivalent of a data abort from the firmware.
  • ...PMON can't handle an initrd.
 Observation #4: For an architecture endorsed by RMS, this is all unreasonably obscure.
  • I decided to chain-load  through GRUB2, but there are no GRUB2 prebuilts.
  • There is a GRUB2 prebuilt for the Yeelong, but this one just flashes "press ESC to skip loading on-disk grub.cfg", while ignoring both the USB HID and the serial.
  • There are no "newer" firmwares you can test chain-boot.
Technically, I don't care, but booting Linux is good proof that the hardware is alive, and allows for self-hosted development.

There was something already present on the disk (rays Linux), but it didn't boot with the default boot options. Manually doing -
PMON> load /dev/fs/ext2@wd0/boot/vmlinux-2.6.18-fl-v1.02
PMON> g init=/bin/bash me far enough to change the password so I could use the system, copy the kernel and initrd from the USB stick, and add a boot.cfg entry (root=/dev/sda1 console=/dev/ttyS0)

Of course, still can't boot the Arch Linux build. I get a bunch of unaligned kernel access followed by a page fault.



Tuesday, October 2, 2012

FUSE, redux

Here is a useful tutorial for FUSE - In particular the section on unclear FUSE functions is very useful.

Btw, all paths passed into FUSE callbacks are already canonical, and "/" is in fact valid, and refers to the root of the volume.

What am I writing? Secret for now. Hold on :).


Monday, October 1, 2012

More love for VHDtool

Finally got around to making my VHDtool very useful indeed for folks struggling with the VHD image format.

With Hyper-V, generated VHD images could of course be used either with the emulated IDE disks, or with the virtual SCSI adapter. However, if the created disk size is < 127GiB, you have to be careful with the actual size. If the size implied by the specification-mandated C/H/S calculations doesn't match the created disk size, you will run into problems when:

1) Partitioning a VHD on SCSI, then moving it onto IDE.
2) Converting a raw image into a VHD, and using it on IDE.

The problems will manifest themselves as the disk appearing smaller than it was created, and partitions may be corrupted (and VMs unbootable ;-().

Now there is the '-c' option that will ensure that the created disk size never be smaller than desired when used with the emulated IDE adapter.

VHDtool is a *nix utility for creating fixed and dynamic VHD images, and for converting from raw to fixed VHDs. Support for converting to dynamic VHD images will come soon.

In other news, I'll soon be putting out a tool to correct certain VMDK corruptions, that prevent a virtual disk from being attached to at least try file system recovery tools.