docs: Add a new formats section, move static deltas in there

The `src/libostree/README-deltas.md` was rather hidden - let's move this into the manual.
2016-02-22 14:06:20 -05:00 · 2016-02-22 14:06:20 -05:00 · 11b3050fd7
parent 6821ca1029
commit 11b3050fd7
3 changed files with 182 additions and 158 deletions
--- a/docs/manual/formats.md
+++ b/docs/manual/formats.md
@ -0,0 +1,181 @@
 # OSTree data formats
 ## On the topic of "smart servers"
 One really crucial difference between OSTree and git is that git has a
 "smart server".  Even when fetching over `https://`, it isn't just a
 static webserver, but one that e.g. dynamically computes and
 compresses pack files for each client.
 In contrast, the author of OSTree feels that for operating system
 updates, many deployments will want to use simple static webservers,
 the same target most package systems were designed to use.  The
 primary advantages are security and compute efficiency.  Services like
 Amazon S3 and CDNs are a canonical target, as well as a stock static
 nginx server.
 ## The archive-z2 format
 In the [repo](repo) section, the concept of objects was introduced,
 where file/content objects are checksummed and managed individually.
 (Unlike a package system, which operates on compressed aggregates).
 The archive-z2 format simply gzip-compresses each content object.
 Metadata objects are stored uncompressed.  This means that it's easy
 to serve via static HTTP.
 When you commit new content, you will see new `.filez` files appearing
 in `objects/`.
 ## archive-z2 efficiency
 The advantages of `archive-z2`:
 - It's easy to understand and implement
 - Can be served directly over plain HTTP by a static webserver
 - Clients can download/unpack updates incrementally
 - Space efficient on the server
 The biggest disadvantage of this format is that for a client to
 perform an update, one HTTP request per changed file is required.  In
 some scenarios, this actually isn't bad at all, particularly with
 techniques to reduce HTTP overhead, such as
 [HTTP/2](https://en.wikipedia.org/wiki/HTTP/2).
 In order to make this format work well, you should design your content
 such that large data that changes infrequently (e.g. graphic images)
 are stored separately from small frequently changing data (application
 code).
 Other disadvantages of `archive-z2`:
 - It's quite bad when clients are performing an initial pull (without HTTP/2),
 - One doesn't know the total size (compressed or uncompressed) of content
   before downloading everything
 ## Aside: the bare and bare-user formats
 The most common operation is to pull from an `archive-z2` repository
 into a `bare` or `bare-user` formatted repository.  These latter two
 are not compressed on disk.  In other words, pulling to them is
 similar to unpacking (but not installing) an RPM/deb package.
 The `bare-user` format is a bit special in that the uid/gid and xattrs
 from the content are ignored.  This is primarily useful if you want to
 have the same OSTree-managed content that can be run on a host system
 or an unprivileged container.
 ## Static deltas
 OSTree itself was originally focused on a continous delivery model, where
 client systems are expected to update regularly.  However, many OS vendors
 would like to supply content that's updated e.g. once a month or less often.
 For this model, we can do a lot better to support batched updates than
 a basic `archive-z2` repo. However, we still want to preserve the
 model of "static webserver only".  Given this, OSTree has gained the
 concept of a "static delta".
 These deltas are targeted to be a delta between two specific commit
 objects, including "bsdiff" and "rsync-style" deltas within a content
 object.  Static deltas also support `from NULL`, where the client can
 more efficiently download a commit object from scratch.
 Effectively, we're spending server-side storage (and one-time compute
 cost), and gaining efficiency in client network bandwith.
 ## Static delta repository layout
 Since static deltas may not exist, the client first needs to attempt
 to locate one.  Suppose a client wants to retrieve commit `${new}`
 while currently running `${current}`.
 The first thing to understand is that in order to save space, these
 two commits are "modified base64" - the `/` character is replaced with
 `_`.
 Like the commit objects, a "prefix directory" is used to make
 management easier for filesystem tools
 A delta is named `$(mbase64 $from)-$(mbase64 $to)`, for example
 `GpTyZaVut2jXFPWnO4LJiKEdRTvOw_mFUCtIKW1NIX0-L8f+VVDkEBKNc1Ncd+mDUrSVR4EyybQGCkuKtkDnTwk`,
 which in sha256 format is
 `1a94f265a56eb768d714f5a73b82c988a11d453bcec3f985502b48296d4d217d-2fc7fe5550e410128d73535c77e98352b495478132c9b4060a4b8ab640e74f09`.
 Finally, the actual content can be found in
 `deltas/$fromprefix/$fromsuffix-$to`.
 ## Static delta internal structure
 A delta is itself a directory.  Inside, there is a file called
 `superblock` which contains metadata.  The rest of the files will be
 integers bearing packs of content.
 The file format of static deltas should be currently considered an
 OSTree implementation detail.  Obviously, nothing stops one from
 writing code which is compatible with OSTree today.  However, we would
 like the flexibility to expand and change things, and having multiple
 codebases makes that more problematic.  Please contact the authors
 with any requests.
 That said, one critical thing to understand about the design is that
 delta payloads are a bit more like "restricted programs" than they are
 raw data.  There's a "compilation" phase which generates output that
 the client executes.
 This "updates as code" model allows for multiple content generation
 strategies.  The design of this was inspired by that of Chromium:
 [http://dev.chromium.org/chromium-os/chromiumos-design-docs/filesystem-autoupdate](ChromiumOS
 autoupdate).
 ### The delta superblock
 The superblock contains:
 - arbitrary metadata
 - delta generation timestamp
 - the new commit object
 - An array of recursive deltas to apply
 - An array of per-part metadata, including total object sizes (compressed and uncompressed), 
 - An array of fallback objects
 Let's define a delta part, then return to discuss details:
 ## A delta part
 A delta part is a combination of a raw blob of data, plus a very
 restricted bytecode that operates on it.  Say for example two files
 happen to share a common section.  It's possible for the delta
 compilation to include that section once in the delta data blob, then
 generate instructions to write out that blob twice when generating
 both objects.
 Realistically though, it's very common for most of a delta to just be
 "stream of new objects" - if one considers it, it doesn't make sense
 to have too much duplication inside operating system content at this
 level.
 So then, what's more interesting is that OSTree static deltas support
 a per-file delta algorithm called
 [bsdiff](https://github.com/mendsley/bsdiff) that most notably works
 well on executable code.
 The current delta compiler scans for files with maching basenamesin
 each commit that have a similar size, and attempts a bsdiff between
 them.  (It would make sense later to have a build system provide a
 hint for this - for example, files within a same package).
 A generated bsdiff is included in the payload blob, and applying it is
 an instruction.
 ## Fallback objects
 It's possible for there to be large-ish files which might be resistant
 to bsdiff.  A good example is that it's common for operating systems
 to use an "initramfs", which is itself a compressed filesystem.  This
 "internal compression" defeats bsdiff analysis.
 For these types of objects, the delta superblock contains an array of
 "fallback objects".  These objects aren't included in the delta
 parts - the client simply fetches them from the underlying `.filez`
 object.
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -8,3 +8,4 @@ pages:
      - Deployments: 'manual/deployment.md'
      - Atomic Upgrades: 'manual/atomic-upgrades.md'
      - Adapting Existing Systems: 'manual/adapting-existing.md'
      - Formats: 'manual/formats.md'
--- a/src/libostree/README-deltas.md
+++ b/src/libostree/README-deltas.md
@ -1,158 +0,0 @@
 OSTree Static Object Deltas
 ===========================
 Currently, OSTree's "archive-z2" mode stores both metadata and content
 objects as individual files in the filesystem.  Content objects are
 zlib-compressed.
 The advantage of this is model are:
 0) It's easy to understand and implement
 1) Can be served directly over plain HTTP by a static webserver
 2) Space efficient on the server
 However, it can be inefficient both for large updates and small ones:
 0) For large tree changes (such as going from -runtime to
   -devel-debug, or major version upgrades), this can mean thousands
   and thousands of HTTP requests.  The overhead for that is very
   large (until SPDY/HTTP2.0), and will be catastrophically bad if the
   webserver is not configured with KeepAlive.
 1) Small changes (typo in gnome-shell .js file) still require around
   5 metadata HTTP requests, plus a redownload of the whole file.
 Why not smart servers?
 ======================
 Smart servers (custom daemons, or just CGI scripts) as git has are not
 under consideration for this proposal.  OSTree is designed for the
 same use case as GNU/Linux distribution package systems are, where
 content is served by a network of volunteer mirrors that will
 generally not run custom code.
 In particular, Amazon S3 style dumb content servers is a very
 important use case, as is being able to apply updates from static
 media like DVD-ROM.
 Finding Static Deltas
 =====================
 Since static deltas may not exist, the client first needs to attempt
 to locate one.  Suppose a client wants to retrieve commit ${new} while
 currently running ${current}.  The first thing to fetch is the delta
 metadata, called "meta".  It can be found at
 ${repo}/deltas/${current}-${new}/meta.
 FIXME: GPG signatures (.metameta?)  Or include commit object in meta?
 But we would then be forced to verify the commit only after processing
 the entirety of the delta, which is dangerous.  I think we need to
 require signing deltas.
 Delta Bytecode Format
 =====================
 A delta-part has the following form:
 byte compression-type (0 = none, 'g' = gzip')
 REPEAT[(varint size, delta-part-content)]
 delta-part-content:
  byte[] payload
  ARRAY[operation]
 The rationale for having delta-part is that it allows easy incremental
 resumption of downloads.  The client can look at the delta descriptor
 and skip downloading delta-parts for which it already has the
 contained objects.  This is better than simply resuming a gigantic
 file because if the client decides to fetch a slightly newer version,
 it's very probable that some of the downloading we've already done is
 still useful.
 For the actual delta payload, it comes as a stream of pair of
 (payload, operation) so that it can be processed while being
 decompressed.
 Finally, the delta-part-content is effectively a high level bytecode
 for a stack-oriented machine.  It iterates on the array of objects in
 order.  The following operations are available:
 FETCH
  Fall back to fetching the current object individually.  Move
  to the next object.
 WRITE(array[(varint offset, varint length)])
  Write from current input target (default payload) to output.
 GUNZIP(array[(varint offset, varint length)])
  gunzip from current input target (default payload) to output.
 CLOSE
  Close the current output target, and proceed to the next; if the
  output object was a temporary, the output resets to the current
  object.
 # Change the input source to an object
 READOBJECT(csum object)
  Set object as current input target
 # Change the input source to payload
 READPAYLOAD
  Set payload as current input target
 Compiling Deltas
 ================
 After reading the above, you may be wondering how we actually *make*
 these deltas.  I envison a strategy similar to that employed by
 Chromium autoupdate:
 http://www.chromium.org/chromium-os/chromiumos-design-docs/autoupdate-details
 Something like this would be a useful initial algorithm:
 1) Compute the set of added objects NEW
 2) For each object in NEW:
  - Look for a the set of "superficially similar" objects in the
    previous tree, using heuristics based first on filename (including
    prefix), then on size.  Call this set CANDIDATES.
    For each entry in CANDIDATES:
      - Try doing a bup/librsync style rolling checksum, and compute the
        list of changed blocks.
      - Try gzip-compressing it
 3) Choose the lowest cost method for each NEW object, and partition
   the program for each method into deltapart-sized chunks.
 However, there are many other possibilities, that could be used in a
 hybrid mode with the above.  For example, we could try to find similar
 objects, and gzip them together.  This would be a *very* useful
 strategy for things like the 9000 Boost headers which have massive
 amounts of redundant data.
 Notice too that the delta format supports falling back to retrieving
 individual objects.  For cases like the initramfs which is compressed
 inside the tree with gzip, we're not going to find an efficient way to
 sync it, so the delta compiler should just fall back to fetching it
 individually.
 Which Deltas To Create?
 =======================
 Going back to the start, there are two cases to optimize for:
 1) Incremental upgrades between builds
 2) Major version upgrades
 A command line operation would look something like this:
 $ ostree --repo=/path/to/repo gendelta --ref-prefix=gnome-ostree/buildmaster/ --strategy=latest --depth=5
 This would tell ostree to generate deltas from each of the last 4
 commits to each ref (e.g. gnome-ostree/buildmaster/x86_64-runtime) to
 the latest commit.  It might also be possible of course to have
 --strategy=incremental where we generate a delta between each commit.
 I suspect that'd be something to do if one has a *lot* of disk space
 to spend, and there's a reason for clients to be fetching individual
 refs.
 $ ostree --repo=/path/to/repo gendelta --from=gnome-ostree/3.10/x86_64-runtime --to=gnome-ostree/buildmaster/x86_64-runtime
 This is an obvious one - generate a delta from the last stable release
 to the current development head.