Camel-lite internals and documentation
The mmap() CamelFolderSummary
About the mmap() changes
The original CamelFolderSummary implementation of the normal Camel copies all summary information (that is, the typical information like the displayed headers in a folder summary and the content information per message) to a bag of strings that avoids duplicate strings (the camel_pstring_add API).
The implementation in camel-lite, however, uses mmap() for this purpose. It basically reuses the on-disk summary file and mmap() it read-only. From that point it's up to the kernel which parts of the file are in real memory and which parts of the file are swapped to disk (usually it's the memory management of the kernel implementing the mmap() syscall who gets to decide about this).
Overview of the changes
No more copying and needless writing of memory
For mmap, I had to make the summary implementation of Camel stop copying memory and stop writing to it. Because writing to the address space of an mmap basically means writing to the file, which is not good in this case. One reason why a writable mmap() wouldn't be the best idea is because the filesystem and hardware often used on mobile devices for storing files is flash based (for example jffs2 and LogFS). Flash devices often get to deal with level wearing. Therefore frequent but sporadic in location and time writes to the mmap() could mean that a lot pages on the flash card must be rewritten.
No writable mmap(), fwrite() is better for this situation
It's better to simply write using fwrite() which is buffered on 4k. This means that always entire pages are written. The implementations in libtinymail-camel/camel-lite/camel/camel-file-utils.c are the ones who write the information using fwrite(). This information is later loaded using mmap() in stead of as in the old implementation using fread(). The written information is also aligned to G_MEM_ALIGN so that architectures that don't allow unaligned access (like most ARM systems) can easily work with the mmap() memory.
The string format of the summary
The summary-file string format was non-NULL terminated pstring with an encoded length field in front of it. Because I don't want to write that NULL byte (see above), I had to change the format of the summary files to have pstrings that actually are NULL terminated. I can’t write that NULL byte after parsing such a string: I would otherwise also write it to the file or I would have to copy the string to another buffer. Which would have defeated the purpose of mmap.
Parsing the memory
The new implementation basically reads the length-info of each string to calculate the offset of the next piece of information (usually a string or an integer). It stores the address to the offset in the CamelMessageInfo struct. The summary file doesn’t only contain strings. There’s also encoded integers, fixed width integers, encoded time_t and encoded off_t types in it. I had to make sure decoders for these types where implemented for usage with the mmap() memory.
How it works
Loading the summary.mmap file
The camel_folder_summary_load method will mmap() the summary.mmap file using the g_mapped_file_new glib API call. In the CamelFolderSummary is the variable filepos set to the beginning of the mmap() address space.
The summary_header_load method will now read (memory-map, but from now on I will use the word read for this) the header of the summary itself. This header contains a flags, a nextuid, a time, a saved_count, a unread_count, a deleted_count and a junk_count variable.
Using the saved_count variable the camel_folder_summary_load method will now read summary item per summary item. One summary item contains the uid, from, to, cc, flags, date_sent, date_received and the content info of the message. On a normal Camel a few other fields are added too, but to reduce memory consumption per summary item it was decided to cut these away for tinymail's purpose.
Dealing with changes
Whenever something in the summary changes, this must be reflected to the on-disk file and of course the mmap. The technique to do this is rather simple: the structure instances that point to the memory in the mmap() memory are not removed but in stead is the uid field copied as normal memory. Then the mmap() is unmapped, the new information is written to a new file, the new file is moved to the normal to-mmap filename and finally is the mmap() reloaded. While reloading are the existing struct instances looked up by the uid of the message. If found will that instance be used, else will a new struct instance be used. Orphaned ones are left as small leak until the folder is closed (and therefore the entire memory for struct instances is freed).
Under normal circumstances are no struct instances evern orphaned nor are uids not recovered while reloading (because while reloading the file and the mmap is locked or writing new summary information should not have happened anyway).
Changes are queued until 1,000 new messages arrived. The trigger "something is changed" happens in various other parts of camel-lite (like, when refreshing the summary information of a specific provider, like the IMAP provider).
The method of CamelFolderSummary that stores the changes is camel_folder_summary_save. It's implemented by opening a new file, writing all summary items to it and while doing that storing the uid in memory (rather than in mmap), unloading the mmap, and reloading the mmap (and while doing that, finding back the original struct instance using the in-memory uid and reusing that instance).
