Monday, January 4, 2010

You know it's Monday when...

So, anyway, I've got this ARM project for which I've written a simple ODT-like monitor that includes the ability to download files using KERMIT; ^A, the KERMIT start-of-packet character, is treated as a command to accept and process a KERMIT packet. The files are stored on one of those little SD flash card thingies popular for MP3 players and digital cameras.

Being a simple system, it stores files contiguously in a manner similar to RT-11. Files are described by their starting data block number and size in blocks; when you look up a file, the monitor searches the directory for the named file and fills in a structure describing its
position and size.

The monitor itself is stored in flash memory internal to the CPU; when the CPU is reset, it jumps to the monitor and starts running. The flash can be programmed either using JTAG or by the CPU using a special command sequence.

When you create a file, the monitor searches for free space and fills in a structure describing the position and size of the hole that it found.

Today, all of a sudden, right about lunchtime, I couldn't create files. Totally out of the blue. Downloaded one file just fine, the next download couldn't create the file.

Fiddling with the system, I discovered that it was no longer properly describing the hole it had found in which to create the file. It reported the size, but no longer reported its starting position.

When a file is deleted, the system just marks the space occupied by the file as being empty. When searching for a hole in which to create the file, adjacent holes are collected and treated as a single, larger hole. The search continues until a hole is found that is either at least the requested size or is the largest hole available on the disk.

Since it needs to coalesce holes, the hole-finding routine tracks the current largest hole in a local variable. When it's done, it copies the description from the internal variable to the structure in which the info is supposed to be plunked and returns success, kind of like this (in C):

 
FoundHole->Start = LargestHole->Start;
FoundHole->Size = LargestHole->Size;
return Success;

This is the only place where the routine returns successful status, and also the only place where the size of the hole is returned, so clearly this code was being executed. It just wasn't returning the start of the file.

After scratching my head for a while, I got desperate enough to pull out the assembly language listings and poke through them.

I discovered that a bit had flipped in the flash. The instruction that was supposed to return the starting address of the file had changed into something else.

Now, programming the internal flash of the CPU is not simple and straightforward; it's intentionally complicated so you can't do it accidentally. You have to issue commands to the flash controller to put it in program mode, then you have to tell it what you want to program, and then you have to wait for it to be finished, and then you have to issue another command to turn the flash programmer off so that you can read the flash.

Key point being that you can't run from the same flash bank you're programming, because once you flip into programming mode you can't read the flash anymore until you command the programmer to exit.

Unless, apparently, it's Monday.

EDIT: D'oh! I had no idea Blogger was keeping my line breaks. What's the point of the p tag, then?

No comments: