From 11b3050fd73562d8d0caed948a5b421a68d0d793 Mon Sep 17 00:00:00 2001 From: Colin Walters Date: Mon, 22 Feb 2016 14:06:20 -0500 Subject: [PATCH] docs: Add a new formats section, move static deltas in there The `src/libostree/README-deltas.md` was rather hidden - let's move this into the manual. --- docs/manual/formats.md | 181 +++++++++++++++++++++++++++++++++ mkdocs.yml | 1 + src/libostree/README-deltas.md | 158 ---------------------------- 3 files changed, 182 insertions(+), 158 deletions(-) create mode 100644 docs/manual/formats.md delete mode 100644 src/libostree/README-deltas.md diff --git a/docs/manual/formats.md b/docs/manual/formats.md new file mode 100644 index 00000000..bf7fd0ae --- /dev/null +++ b/docs/manual/formats.md @@ -0,0 +1,181 @@ +# OSTree data formats + +## On the topic of "smart servers" + +One really crucial difference between OSTree and git is that git has a +"smart server". Even when fetching over `https://`, it isn't just a +static webserver, but one that e.g. dynamically computes and +compresses pack files for each client. + +In contrast, the author of OSTree feels that for operating system +updates, many deployments will want to use simple static webservers, +the same target most package systems were designed to use. The +primary advantages are security and compute efficiency. Services like +Amazon S3 and CDNs are a canonical target, as well as a stock static +nginx server. + +## The archive-z2 format + +In the [repo](repo) section, the concept of objects was introduced, +where file/content objects are checksummed and managed individually. +(Unlike a package system, which operates on compressed aggregates). + +The archive-z2 format simply gzip-compresses each content object. +Metadata objects are stored uncompressed. This means that it's easy +to serve via static HTTP. + +When you commit new content, you will see new `.filez` files appearing +in `objects/`. + +## archive-z2 efficiency + +The advantages of `archive-z2`: + + - It's easy to understand and implement + - Can be served directly over plain HTTP by a static webserver + - Clients can download/unpack updates incrementally + - Space efficient on the server + +The biggest disadvantage of this format is that for a client to +perform an update, one HTTP request per changed file is required. In +some scenarios, this actually isn't bad at all, particularly with +techniques to reduce HTTP overhead, such as +[HTTP/2](https://en.wikipedia.org/wiki/HTTP/2). + +In order to make this format work well, you should design your content +such that large data that changes infrequently (e.g. graphic images) +are stored separately from small frequently changing data (application +code). + +Other disadvantages of `archive-z2`: + + - It's quite bad when clients are performing an initial pull (without HTTP/2), + - One doesn't know the total size (compressed or uncompressed) of content + before downloading everything + +## Aside: the bare and bare-user formats + +The most common operation is to pull from an `archive-z2` repository +into a `bare` or `bare-user` formatted repository. These latter two +are not compressed on disk. In other words, pulling to them is +similar to unpacking (but not installing) an RPM/deb package. + +The `bare-user` format is a bit special in that the uid/gid and xattrs +from the content are ignored. This is primarily useful if you want to +have the same OSTree-managed content that can be run on a host system +or an unprivileged container. + +## Static deltas + +OSTree itself was originally focused on a continous delivery model, where +client systems are expected to update regularly. However, many OS vendors +would like to supply content that's updated e.g. once a month or less often. + +For this model, we can do a lot better to support batched updates than +a basic `archive-z2` repo. However, we still want to preserve the +model of "static webserver only". Given this, OSTree has gained the +concept of a "static delta". + +These deltas are targeted to be a delta between two specific commit +objects, including "bsdiff" and "rsync-style" deltas within a content +object. Static deltas also support `from NULL`, where the client can +more efficiently download a commit object from scratch. + +Effectively, we're spending server-side storage (and one-time compute +cost), and gaining efficiency in client network bandwith. + +## Static delta repository layout + +Since static deltas may not exist, the client first needs to attempt +to locate one. Suppose a client wants to retrieve commit `${new}` +while currently running `${current}`. + +The first thing to understand is that in order to save space, these +two commits are "modified base64" - the `/` character is replaced with +`_`. + +Like the commit objects, a "prefix directory" is used to make +management easier for filesystem tools + +A delta is named `$(mbase64 $from)-$(mbase64 $to)`, for example +`GpTyZaVut2jXFPWnO4LJiKEdRTvOw_mFUCtIKW1NIX0-L8f+VVDkEBKNc1Ncd+mDUrSVR4EyybQGCkuKtkDnTwk`, +which in sha256 format is +`1a94f265a56eb768d714f5a73b82c988a11d453bcec3f985502b48296d4d217d-2fc7fe5550e410128d73535c77e98352b495478132c9b4060a4b8ab640e74f09`. + +Finally, the actual content can be found in +`deltas/$fromprefix/$fromsuffix-$to`. + +## Static delta internal structure + +A delta is itself a directory. Inside, there is a file called +`superblock` which contains metadata. The rest of the files will be +integers bearing packs of content. + +The file format of static deltas should be currently considered an +OSTree implementation detail. Obviously, nothing stops one from +writing code which is compatible with OSTree today. However, we would +like the flexibility to expand and change things, and having multiple +codebases makes that more problematic. Please contact the authors +with any requests. + +That said, one critical thing to understand about the design is that +delta payloads are a bit more like "restricted programs" than they are +raw data. There's a "compilation" phase which generates output that +the client executes. + +This "updates as code" model allows for multiple content generation +strategies. The design of this was inspired by that of Chromium: +[http://dev.chromium.org/chromium-os/chromiumos-design-docs/filesystem-autoupdate](ChromiumOS +autoupdate). + +### The delta superblock + +The superblock contains: + + - arbitrary metadata + - delta generation timestamp + - the new commit object + - An array of recursive deltas to apply + - An array of per-part metadata, including total object sizes (compressed and uncompressed), + - An array of fallback objects + +Let's define a delta part, then return to discuss details: + +## A delta part + +A delta part is a combination of a raw blob of data, plus a very +restricted bytecode that operates on it. Say for example two files +happen to share a common section. It's possible for the delta +compilation to include that section once in the delta data blob, then +generate instructions to write out that blob twice when generating +both objects. + +Realistically though, it's very common for most of a delta to just be +"stream of new objects" - if one considers it, it doesn't make sense +to have too much duplication inside operating system content at this +level. + +So then, what's more interesting is that OSTree static deltas support +a per-file delta algorithm called +[bsdiff](https://github.com/mendsley/bsdiff) that most notably works +well on executable code. + +The current delta compiler scans for files with maching basenamesin +each commit that have a similar size, and attempts a bsdiff between +them. (It would make sense later to have a build system provide a +hint for this - for example, files within a same package). + +A generated bsdiff is included in the payload blob, and applying it is +an instruction. + +## Fallback objects + +It's possible for there to be large-ish files which might be resistant +to bsdiff. A good example is that it's common for operating systems +to use an "initramfs", which is itself a compressed filesystem. This +"internal compression" defeats bsdiff analysis. + +For these types of objects, the delta superblock contains an array of +"fallback objects". These objects aren't included in the delta +parts - the client simply fetches them from the underlying `.filez` +object. diff --git a/mkdocs.yml b/mkdocs.yml index b11dbc87..89211c79 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -8,3 +8,4 @@ pages: - Deployments: 'manual/deployment.md' - Atomic Upgrades: 'manual/atomic-upgrades.md' - Adapting Existing Systems: 'manual/adapting-existing.md' + - Formats: 'manual/formats.md' diff --git a/src/libostree/README-deltas.md b/src/libostree/README-deltas.md deleted file mode 100644 index 9e09e45a..00000000 --- a/src/libostree/README-deltas.md +++ /dev/null @@ -1,158 +0,0 @@ -OSTree Static Object Deltas -=========================== - -Currently, OSTree's "archive-z2" mode stores both metadata and content -objects as individual files in the filesystem. Content objects are -zlib-compressed. - -The advantage of this is model are: - -0) It's easy to understand and implement -1) Can be served directly over plain HTTP by a static webserver -2) Space efficient on the server - -However, it can be inefficient both for large updates and small ones: - -0) For large tree changes (such as going from -runtime to - -devel-debug, or major version upgrades), this can mean thousands - and thousands of HTTP requests. The overhead for that is very - large (until SPDY/HTTP2.0), and will be catastrophically bad if the - webserver is not configured with KeepAlive. -1) Small changes (typo in gnome-shell .js file) still require around - 5 metadata HTTP requests, plus a redownload of the whole file. - -Why not smart servers? -====================== - -Smart servers (custom daemons, or just CGI scripts) as git has are not -under consideration for this proposal. OSTree is designed for the -same use case as GNU/Linux distribution package systems are, where -content is served by a network of volunteer mirrors that will -generally not run custom code. - -In particular, Amazon S3 style dumb content servers is a very -important use case, as is being able to apply updates from static -media like DVD-ROM. - -Finding Static Deltas -===================== - -Since static deltas may not exist, the client first needs to attempt -to locate one. Suppose a client wants to retrieve commit ${new} while -currently running ${current}. The first thing to fetch is the delta -metadata, called "meta". It can be found at -${repo}/deltas/${current}-${new}/meta. - -FIXME: GPG signatures (.metameta?) Or include commit object in meta? -But we would then be forced to verify the commit only after processing -the entirety of the delta, which is dangerous. I think we need to -require signing deltas. - -Delta Bytecode Format -===================== - -A delta-part has the following form: - -byte compression-type (0 = none, 'g' = gzip') -REPEAT[(varint size, delta-part-content)] - -delta-part-content: - byte[] payload - ARRAY[operation] - -The rationale for having delta-part is that it allows easy incremental -resumption of downloads. The client can look at the delta descriptor -and skip downloading delta-parts for which it already has the -contained objects. This is better than simply resuming a gigantic -file because if the client decides to fetch a slightly newer version, -it's very probable that some of the downloading we've already done is -still useful. - -For the actual delta payload, it comes as a stream of pair of -(payload, operation) so that it can be processed while being -decompressed. - -Finally, the delta-part-content is effectively a high level bytecode -for a stack-oriented machine. It iterates on the array of objects in -order. The following operations are available: - -FETCH - Fall back to fetching the current object individually. Move - to the next object. - -WRITE(array[(varint offset, varint length)]) - Write from current input target (default payload) to output. - -GUNZIP(array[(varint offset, varint length)]) - gunzip from current input target (default payload) to output. - -CLOSE - Close the current output target, and proceed to the next; if the - output object was a temporary, the output resets to the current - object. - -# Change the input source to an object -READOBJECT(csum object) - Set object as current input target - -# Change the input source to payload -READPAYLOAD - Set payload as current input target - -Compiling Deltas -================ - -After reading the above, you may be wondering how we actually *make* -these deltas. I envison a strategy similar to that employed by -Chromium autoupdate: -http://www.chromium.org/chromium-os/chromiumos-design-docs/autoupdate-details - -Something like this would be a useful initial algorithm: -1) Compute the set of added objects NEW -2) For each object in NEW: - - Look for a the set of "superficially similar" objects in the - previous tree, using heuristics based first on filename (including - prefix), then on size. Call this set CANDIDATES. - For each entry in CANDIDATES: - - Try doing a bup/librsync style rolling checksum, and compute the - list of changed blocks. - - Try gzip-compressing it -3) Choose the lowest cost method for each NEW object, and partition - the program for each method into deltapart-sized chunks. - -However, there are many other possibilities, that could be used in a -hybrid mode with the above. For example, we could try to find similar -objects, and gzip them together. This would be a *very* useful -strategy for things like the 9000 Boost headers which have massive -amounts of redundant data. - -Notice too that the delta format supports falling back to retrieving -individual objects. For cases like the initramfs which is compressed -inside the tree with gzip, we're not going to find an efficient way to -sync it, so the delta compiler should just fall back to fetching it -individually. - -Which Deltas To Create? -======================= - -Going back to the start, there are two cases to optimize for: - -1) Incremental upgrades between builds -2) Major version upgrades - -A command line operation would look something like this: - -$ ostree --repo=/path/to/repo gendelta --ref-prefix=gnome-ostree/buildmaster/ --strategy=latest --depth=5 - -This would tell ostree to generate deltas from each of the last 4 -commits to each ref (e.g. gnome-ostree/buildmaster/x86_64-runtime) to -the latest commit. It might also be possible of course to have ---strategy=incremental where we generate a delta between each commit. -I suspect that'd be something to do if one has a *lot* of disk space -to spend, and there's a reason for clients to be fetching individual -refs. - -$ ostree --repo=/path/to/repo gendelta --from=gnome-ostree/3.10/x86_64-runtime --to=gnome-ostree/buildmaster/x86_64-runtime - -This is an obvious one - generate a delta from the last stable release -to the current development head.