This is just an efficiency optimization. We're getting fairly close
to all of the hot code paths using `*at()`.
Note that we end up maintaining a half-duplicate code path set here,
because we still need to support commits from an arbitrary GFile *,
which in a possible common case is an OSTree commit.
I think it's worth it though.
rpm-ostree had code to check for this, which didn't actually work.
I don't see a no backwards compatibility concern in changing this, as
it's unlikely a caller would try to sensibly disambiguate FAILED.
We were already using openat() for the contents, but not the xattrs.
Now that libgsystem 2014.3 has gs_fd_get_all_xattrs(), make use of it.
Clean things up a bit so we only open the fd once.
For Anaconda, I needed OSTREE_REPO_REMOTE_CHANGE_ADD_IF_NOT_EXISTS,
with the GFile *sysroot argument to avoid ugly hacks. We want to
write the content provided via "ostreesetup" as a remote to the target
chroot only in the case where it isn't provided as part of the tree
content itself.
This is also potentially useful in idempotent systems management tools
like Ansible.
https://bugzilla.gnome.org/show_bug.cgi?id=741577
ostree_repo_pull_with_options() needs this, and I'd rather keep the
OstreeRemote struct definition tucked away in ostree-repo.c with its
own internal API.
OstreeRemote is a reference-counted struct that encompasses data about a
remote, whether read from a configuration file or created explicitly via
ostree_repo_remote_add().
OstreeRemotes are held in an internal table indexed by remote name.
This solves some problems caused by merging system-wide remote data into
the OstreeRepo's internal config key file.
Also fixes https://bugzilla.gnome.org/show_bug.cgi?id=740911
This format is pretty much the same as the "bare" format, except the
file ownership and xattrs is not stored in the actual filesystem object, but
rather on the side in a user xattr. This means two things:
1) An unprivileged user can store such a repo independent of the types
of files in it or their xattrs. And you can later (as root)
reconstruct the real filesystem tree with ownership. Although you
can't do that using hardlink-sharing. This also means ostree
fsck does a full verification.
2) Such a repository can be checked out with user-mode (checkout -U)
as an unprivileged user using hardlinks for space sharing.
Additionally, symlinks are stored as regular files (with the content
being the symlink target) because user xattrs are not supported on
symlinks. We know at checkout time if the file is a symlink because
the original st_mode is stored in the xattr metadata.
https://bugzilla.gnome.org/show_bug.cgi?id=741125
Applying xattrs on a symlink during checkout failed since
it was setting the xattrs on the final filename, not the
temporary name.
This made the "checkout union 1" test in test-basic.sh
fail.
https://bugzilla.gnome.org/show_bug.cgi?id=741125
When commiting a symlink we do store the uid/gid of the actual
symlink (i.e. not target). However, this was not restored
on non-user-mode checkout as it should.
This commit fixes that, and additionally it ensures xattrs
on symlinks are not set in user-mode checkout.
https://bugzilla.gnome.org/show_bug.cgi?id=741125
rpm-ostree at least has the option to generate a tree with just that
instead of /boot, but while we were enumerating the latter, we'd still
return paths from /boot.
https://bugzilla.gnome.org/show_bug.cgi?id=740947
In Anaconda, we're using "ostree admin --sysroot=/mnt/sysimage
instutil set-kargs", and it was working before, but newer versions of
lorax strip out /etc/system-release which grub2 wants.
That was wrong anyways as we want the /etc/system-release from the
target root.
(Man, grub2 sucks...give me a declarative config file format I can just
write)
https://bugzilla.gnome.org/show_bug.cgi?id=740697
Make _ostree_fetcher_request_uri_with_partial_async and
ostree_fetcher_stream_uri_async simple wrapper around the same
function, all the requests are created in the same place now.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Rename _ostree_fetcher_contents_membuf_sync to
ostree_fetcher_request_uri_to_membuf and drop unused argument
user_data.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
_ostree_fetcher_query_state_text() and_ostree_fetcher_get_n_requests()
have no callers, so remove them.
If they will be needed, they can be easily copied back from the git
history.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Use the pattern:
$PRETTY_NAME [$COMMIT_VERSION] (ostree[:$OSNAME][:$DEPLOYMENT_INDEX])
$OSNAME is only shown if there are multiple values.
$COMMIT_VERSION refers to the version tag in the commit's metadata.
$DEPLOYMENT_INDEX is only shown if no $COMMIT_VERSION is available.
https://bugzilla.gnome.org/show_bug.cgi?id=739416
src/libostree/ostree-repo-pull.c:1676:22: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized]
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
We potentially need a lot of argument types for pull. Rather than
have a C function with tons of arguments, let's use a GVariant a{sv}
as a handy extensible (and immutable) bag of properties.
This is prepratory work for adding an option to pull to traverse
history.
https://bugzilla.gnome.org/show_bug.cgi?id=737844
fixes a coredump when using a command like:
$ ostree --repo=repo checkout -U --subpath=/usr/lib/passwd \
fedora-atomic/rawhide/x86_64/docker-host usrlib-new
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
We need basic support for UEFI - many newer servers don't support
BIOS compatibility mode anymore.
However, this patch only implements non-atomic because UEFI is FAT, and
we can't do the previous design for OSTree of atomic swap of
/boot/loader.
The Fedora/RHEL UEFI layout has the kernels on a "real" /boot
partition, and /boot/efi/EFI/$vendor just holds the grub2 UEFI binary
and grub.cfg.
Following this, /boot/loader is still on the OS boot partition, and we
still atomically swap it. This potentially paves the way to atomic
upgrades in the future.
https://bugzilla.gnome.org/show_bug.cgi?id=724246
Some package systems need to be run as root, so the process linking to
libostree may also be root. However, it's reasonable to have the
target repository be owned by a uid other than root.
This patch makes it Just Work by chowning the file content to match.
Note this only operates on archive-z2 repositories, because you can't
usefully serve bare repositories via HTTP.
https://bugzilla.gnome.org/show_bug.cgi?id=738954
For Anaconda, we have an ugly bootstrapping problem where we need to
add the remote to the repository's config, then do a pull+deploy, then
remove and re-add the config, because /etc/ostree/remotes.d doesn't
exist yet in the target system.
https://bugzilla.gnome.org/show_bug.cgi?id=738698
While we did support disabling the uncompressed-objects-cache
per-repository:
1) We didn't actually respect that operation when doing
CHECKOUT_MODE_USER on archive-z2 repositories
2) It'd be better to automatically detect we can't write to the
repo and disable the uncompressed cache then.
In this approach, we drop a /etc/grub.d/15_ostree file which is a
hybrid of shell/C that picks up bits from the GRUB2 library (e.g. the
block device script generation), and then calls into libostree's
GRUB2 code which knows about the BLS entries.
This is admittedly ugly. There exists another approach for GRUB2 to
learn the BLS specification. However, the spec has a few issues:
https://www.redhat.com/archives/anaconda-devel-list/2014-July/msg00002.html
This approach also gives a bit more control to the admin via the
naming of the 15_ostree symlink; they can easily disable it:
Or reorder the ostree entries ahead of 10_linux:
Also, this approach doesn't require patches for grub2, which is an
issue with the pressure to backport (rpm-)OSTree to EL7.
src/libostree/ostree-repo.c:1759: Warning: OSTree:
ostree_repo_import_object_from: unknown parameter 'checksum' in
documentation comment, should be 'sha256'
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Some operating systems may come with external tools for subscription
management that drive access to the content. In that case, the origin
file may not be useful (for example, it could refer to an installer
ISO).
This patch will allow OS installers to inject that state, with a
useful error message, directing the system administrator to an
external tool.
See: https://github.com/projectatomic/rpm-ostree/issues/31https://bugzilla.gnome.org/show_bug.cgi?id=737686
Now that we have a summary file, we can use it to allow a simple:
ostree pull --mirror
To download the latest commit on every branch. Also, for a case I'm
dealing with there's only one branch, but I don't want mirror users to
have to hardcode it.
https://bugzilla.gnome.org/show_bug.cgi?id=737807
And use it in pull-local. As one might expect, this is blazingly fast
if they're on the same filesystem.
I'll be using this to "promote" builds between different repositories.
Add command line arguments:
--import-proc-cmdline: import values from /proc/cmdline
--merge: import current values
--replace=ARG=VALUE: replace value
--append=ARG=VALUE: append a new argument
Extra command line arguments are treated like --append=, which
gives backwards compatibility.
https://bugzilla.gnome.org/show_bug.cgi?id=731051
Previously, in the case where a parent directory of a modified config
file was removed, we would throw an exception. This happens when
switching from a tree that has some software (e.g. firewalld), to one
that does not.
While it's nice to have this warning that your config file probably no
longer applies, there's no need to make it so...fatal.
It's particularly problematic that the only easy workaround is to
remove the config files from your current tree - which breaks
rollback.
The solution then is for for us to take ownership of the parent
directories too into the new /etc. Admins can clean up these files
afterwards at any time.
https://bugzilla.gnome.org/show_bug.cgi?id=734293
This fixes a regression introduced with https://git.gnome.org/browse/ostree/commit/?id=7baa600e237b326899de2899a9bc54a6b863943c
The original code in "ostree admin upgrade" had a comment:
/* Here we perform cleanup of any leftover data from previous
* partial failures. This avoids having to call gs_shutil_rm_rf()
* at random points throughout the process. */
But since I deleted that initial cleanup call, we *do* need to do the
cleanup during the process run. It turns out there are only a few
places this is necessary.
https://bugzilla.gnome.org/show_bug.cgi?id=733030
While looking to fix a different bug here, I found the current
state of things where we had a mix of fd-relative API versus not
frustrating.
Change the code around to consistently use *at, and also add some more
tests.
For Fedora and potentially other distributions which use globally
distributed mirrors, metalink is a popular solution to redirect
clients to a dynamic set of mirrors.
In order to make metalink work though, it needs *one* file which can
be checksummed. (Well, potentially we could explode all refs into the
metalink.xml, but that would be a lot more invasive, and a bit weird
as we'd end up checksumming the checksum file).
This commit adds a new command:
$ ostree summary -u
To regenerate the summary file. Can only be run by one process at a
time.
After that's done, the metalink can be generated based on it, and the
client fetch code will parse and load it.
https://bugzilla.gnome.org/show_bug.cgi?id=729585
Otherwise, we're potentially holding up subsequent requests.
I was hitting this when testing the metalink code, where we want to
continue doing more fetches after hitting a 404.
https://bugzilla.gnome.org/show_bug.cgi?id=729585
Changes the pull API to allow pulling only a single directory instead
of the whole deployment. This option is utilized by the check-diff
option in rpm-ostree.
Add a new state directory to hold <checksum>.commitpartial files, so
we know that we've only downloaded partial state.
I noticed OSTree was a bit slower, did some investigation
and saw we were enumerating all objects for things like
$ ostree rev-parse blah
Since "blah" can never be an object (because of the 'l' and 'h'), just
return no matches.
The user might "ostree ls /usr/bin/bash/blah", which previously would
segfault.
A somewhat related future enhancement here would be for "ostree ls" to
follow symbolic links.
Reported-by: Dusty Mabe <dustymabe@gmail.com>
https://bugzilla.gnome.org/show_bug.cgi?id=733476
Prune has worked fine on bare repositories for some time, but now that
I finally try to delete data on the server side, I notice we weren't
actually enumerating content objects =/
That caused them to not be pruned.
https://bugzilla.gnome.org/show_bug.cgi?id=733458
The prune API duplicated logic to delete objects, and furthermore the
core API to delete an object didn't clean up detached metadata.
Fix the duplication by doing the obvious thing: prune should call
_delete.
https://bugzilla.gnome.org/show_bug.cgi?id=733452
This patch adds a function that will parse a partial checksum when
resolving a refspec. If the inputted refspec matches a truncated
existing checksum, it will return that checksum to be parsed. If
multiple truncated checksums match the partial refspec, it is not
unique and will return false. This addition is inspired by the same
functionality in Docker, which allows a user to reference a specific
commit without typing the entire checksum.
partial checksums: Add function to abstract comparison
This modifies the list_objects and list_objects_at functions
to take an additional argument for the string that a commit starts
with. If this string arg is not null, it will only list commit
objects beginning with that string. This allows for a new function
ostree_repo_list_commit_objects_starting_with to pass a partial string
and return a list of all matching commits. This improves on the
previous strategy of listing refs because it will list all commit objects,
even ones in past history. This update also includes bugfixes on
error handling and string comparison, and changes the output structure
of resolve_partial_checksum. The new strcuture will no longer return FALSE
without error. Also, the hashtable foreach now uses iter. Also
includes modified test file
Some organizations will want to use private Certificate Authorities to
serve content to their clients. While it's possible to add the CA
to the system-wide CA store, that has two drawbacks:
1) Compromise of that cert means it can be used for other web traffic
2) All of ca-certificates is trusted
This patch allows a much stronger scenario where *only* the CAs in
tls-ca-path are used for verification from the given repository.
https://bugzilla.gnome.org/show_bug.cgi?id=726256
We were using unsigned size when we should have been using signed,
this means we basically weren't checking for errors on write...ouch.
Luckily if we e.g. hit ENOSPC during a pull, the checksums wouldn't
match and we'd return an error anyways. However when writing an
object, we'd end up silently ignoring it =/
https://bugzilla.gnome.org/show_bug.cgi?id=732020
The generic GKeyFile error isn't quite informative enough here.
I hit this with the new compose process where we don't automatically
inject a configured remote into the generated disk images; we expect
people to add them.
https://bugzilla.gnome.org/show_bug.cgi?id=731346
There's several use cases for calling into ostree itself to do
mirroring, instead of using bare rsync. For example, it's a bit more
efficient as it doesn't require syncing the objects/ directory.
https://bugzilla.gnome.org/show_bug.cgi?id=728351
In the case of running ostree as non-root on a regular filesystem (not
tmpfs which doesn't support immutable), we should just silently do
nothing if we encounter EPERM. Cache the result to avoid spam in
strace.
https://bugzilla.gnome.org/show_bug.cgi?id=728006
We weren't installing the headers, but at the moment all symbols
starting with ostree_ were being exported. Fix that by prefixing
non-static symbols with '_'.
https://bugzilla.gnome.org/show_bug.cgi?id=731369
Finally, fsync to ensure all entries are on disk, unless disabled.
We support disabling this for cases like server-side buildroot
construction where we don't need to be robust against power loss
This prevents people from creating new directories there and expecting
them to be persisted. The OSTree model has all local state to be in
/etc and /var.
This introduces a compile-time dependency on libe2fsprogs.
We're only doing this for the root directory at the moment.
https://bugzilla.gnome.org/show_bug.cgi?id=728006
If fetching GPG-signed commits over plain HTTP, a MitM attacker can
fill up the drive of targets by simply returning an enormous stream
for the commit object.
Related to this, an attacker can also cause OSTree to perform large
memory allocations by returning enormous GVariants in the metadata.
This helps close that attack by limiting all metadata objects to 10
MiB, so the initial fetch will be truncated.
But now the attack is only slightly more difficult as the attacker
will have to return a correctly formed commit object, then return a
large stream of < 10 MiB dirmeta/dirtree objects.
https://bugzilla.gnome.org/show_bug.cgi?id=725921
The current "transaction" symlink was introduced to fix issues with
interrupted pulls; normally we assume that if we have a metadata
object, we also have all objects to which it refers.
There used to be a "summary" which had all the available refs, but I
deleted it because it wasn't really used, and was still racy despite
the transaction bits.
We still want the pull process to use the transaction link, so don't
delete the APIs, just relax the restriction on object writing, and
introduce a new ostree_repo_set_ref_immediate().
They shouldn't be loaded for random test/personal repositories. Doing
so triggers another bug in that we return them from
ostree_repo_get_config() when then causes clients to write them out
permanently to disk with ostree_repo_write_config(). This caused test
suite failures.
For many OS install scenarios, one runs through an installer which may
come with embedded data, and then the OS is configured post-install to
receive updates.
In this model, it'd be nice to avoid the post-install having to rewrite
the /ostree/repo/config file.
Additionally, it feels weird for admins to interact with "/ostree" -
let's make the system feel more like Unix and have our important
configuration in /etc.
https://bugzilla.gnome.org/show_bug.cgi?id=729343
For the static deltas work, we're using the already-extant internal
API to perform a HTTP fetch for optional data - static deltas are
optional.
Except that we didn't correctly unset the error if we were doing an
optional fetch and the data wasn't found.
There's two halves to this; first, when we create an hierarchy, we
need to call fsync(). Second, we need to fsync again anytime after
we've modified a directory.
Let's be a bit more conservative here and actually fdatasync() the
configurations we're generating.
I'm seeing an issue at the moment where syslinux isn't finding the
config sometimes, and while I don't think this is the issue, let's try
it.
There was an attempted optimization to only write if changed, but this
is broken - we always write the bootloader config into a new
directory.
In theory we should only be writing if it changed, but let's not do a
broken optimization.
The previous commit here changed things so that we do mkdir(x, 0700),
then fchmod later only if we created the directory.
However the logic was incorrect; we still need to chmod even in
MODE_USER if we created the directory.
Otherwise this broke atomicity; we could fetch/store the ref, then
crash, and then not upgrade the next time we tried upgrading.
The correct model is: the tree has changed if the new ref is different
from the merge deployment.
Trying to implement "rpm-ostree rollback", in the case where we have 2
deployments with the same bootconfig that we're reordering, we need to
write bootconfig, not just swap the bootlinks.
It turns out people sometimes want to be able to change the kernel
arguments. Add a convenient API to do so for the current deployment.
This will be used by Anaconda.
We shouldn't g_print() from a library, particularly when the
expectation is that the client has an async progress set up.
This should fix the pull output extending the status line.
If a MITM attacker (or just network corruption) causes a temporary
downloaded object in tmp/ to be corrupted, we'll end up
continually trying to commit it, and fail.
Fix this unlinking the temp file immediately after opening it. This
will ensure that if we exit due to an error (or crash), the kernel
will clean up the space for us.
https://bugzilla.gnome.org/show_bug.cgi?id=725924
This is a bit more efficient in that we're not walking full paths, and
it helps avoid security/reliability issues if an attacker (or just a
misbehaving process) has the ability to mutate paths in the middle.
Mixing async and threads has proved to be too much for my little mind.
It has race conditions that I've tried repeatedly to fix, but failed.
The threading here was scanning metadata objects - and there are
two parts to that:
1) Physically loading them from disk
2) Parsing them
Now #1 has been partially addressed by avoiding a storm of lstat() if
we're starting from a known working state. If pull gets interrupted,
then we do need to rescan all objects. Also, we can address this with
local metadata packfiles.
The other potentially slow bit is that we recurse across the metadata,
blocking the main thread. We could ameliorate that in the future by
scheduling metadata parsing as idle "chunks".
Anyways, let's move the needle back to reliability, and readd speed
more carefully.
https://bugzilla.gnome.org/show_bug.cgi?id=706456
We don't want to allow MITM attackers to intercept upgrade requests
and provide clients with older OS versions vulnerable to security
flaws.
Only "ostree admin upgrade" gets this behavior for now - whether we
want to do it for "ostree admin switch" is another question.
It's better if this is independent from the OstreeSysroot; for
example, a policy is active in a given deployment root at once, not
for a sysroot globally.
We can also collect SELinux-related API in one place.
Unfortunately at the moment there can be only one instance of this
class per process.
We're seeing some hangs while ostree is fetching updates.
I imagine the fact that SoupSessionAsync has no timeout by default
could be the cause of this.
Set timeout values to 60 seconds, which is the default for the new
SoupSession API which we may switch to later.
https://bugzilla.gnome.org/show_bug.cgi?id=724310
The instructions one finds on the internets are apparently wrong, we
really need to keep the default here, since gpgme uses it to actually
find the helper binary it runs.
This fixes the GPG tests for me on EL7 at least.
This has a very basic level of functionality (deltas can be generated,
and applied offline). There is only some stubbed out pull code to
fetch them via HTTP.
But, better to commit this now and improve it from a known starting
point, rather than have it languish in a branch.
First, /var needs to be labeled at least once. We should probably
rearrange things so that /var is only created (and labeled) on the
first deployment, but this patch adds a /var/.ostree-selabeled file
instead.
Second, when doing the /etc merge, we compare the xattrs of the old
/usr/etc versus the current /etc. The problem with that is that the
policy has different labels for /usr/etc on disk than the real /etc.
The correct fix for this is a bit invasive - we have to take the
physical content of the old /usr/etc, but compare the labels as if
they were really in /etc.
Instead for now, just ignore changes to xattrs. If the file
content/mode changes, then we take the new file (including any changed
xattrs).
Bottom line: just doing chcon -t blah_t /etc/foo.conf may be lost on
upgrade (for now).
This will be used by guestmount - it's WAY faster. We only take disks
as a unit, so it's safe. If the process fails halfway through, we
just start over from scratch the next time anyways.
The trees as shipped come with /usr/etc, which should just be labeled
as usr_t. When we do a deployment, we need to relabel the copies of
the files we're making in /etc.
SELinux support is compile and runtime optional.
The intent of this code I'm fairly certain was to use *.gpg from the
trusted.gpg.d, directory. But right now, we're only using
"pubring.gpg" from that directory, which is odd.
Let's fix this to use all keys ending in .gpg, which will also
include pubring.gpg.
Only send _IDLE messages if and only if we state transition the main
thread (from idle -> !idle or !idle -> idle). This ensures that we
don't send IDLE, then get it back, and process that when we're !idle.
This is a redesign (again) of the pull code. It is simpler and
survives 20 minutes of testing in a loop, whereas the old code would
only go from 30 seconds to 2 minutes.
The problem with the old code was that there was a race where we might
determine idle state even when there are content requests in flight
between the metadata thread and the main one.
This code majorly reworks things - there's now only one IDLE message,
sent in a circle from the main thread, through the metadata scanner,
and back to the main one.
Crucially it's only sent when the *main* thread is idle. Previously
we were looking at whether the metadata scanner is idle, but that
doesn't make a lot of sense. First let's make sure the main thread is
idle, then verify that the metadata one is.
This closes the loop because we'll have ensured we get any pending
requests.
https://bugzilla.gnome.org/show_bug.cgi?id=706456
The "ordered hash" code was really just for kernel arguments. And it
turns out it needs to be a multihash (for e.g. multiple console=
arguments).
So turn the OstreeOrderedHash into OstreeKernelArgs, and move the bits
to split key=value and such into there.
Now we're not making this public API yet - the public OstreeSysroot
just takes char **kargs. To facilitate code reuse between ostree/ and
libostree/, make it a noinst libtool library. It'll be duplicated in
the binary and library, but that's OK for now. We can investigate
making OstreeKernelArgs public later.
https://bugzilla.gnome.org/show_bug.cgi?id=721136
The official way to add bootloader arguments to the current deployment
is to redeploy with --karg. However, doing so tripped up an
optimization made inside the deployment code to just swap the
bootlinks if we're keeping the same "bootcsum".
Change this optimization to look at the pair of (bootcsum, options).
We can't use #ifdef in the headers, since then g-ir-scanner won't pick
up the functions (unless we included config.h). Let's instead always
have the symbols, but just set an error if we were built without
support for it, just like how pull works.
This large patch moves the core xattr logic down into libgsystem,
which allows the gs_shutil_cp_a() API to copy them. In turn, this
allows us to just use that API instead of rolling our own recursive
copy here.
As noted in the new comment though, one case that we are explicitly
regressing is where the new /etc removes a parent directory that's
needed by a modified file. This seems unlikely for most vendors now,
but let's do that as a separate bug.
https://bugzilla.gnome.org/show_bug.cgi?id=711058
Previously the progress meter would bump in large chunks after we
completed a download. Instead, poll in progress files via fstat() for
their size, and add those to the running total.
Several APIs in libostree were moved there from the commandline code,
and have hardcoded g_print() for progress and notifications. This
isn't useful for people who want to write PackageKit backends, custom
GUIs and the like.
From what I can tell, there isn't really a winning precedent in GLib
for progress notifications.
PackageKit has the model where the source has GObject properties that
change as async ops execute, which isn't bad...but I'd like something
a bit more general where say you can have multiple outstanding async
ops and sensibly track their state.
So, OstreeAsyncProgress is basically a threadsafe property bag with a
change notification signal.
Use this new API to move the GSConsole usage (i.e. g_print()) out from
libostree/ and into ostree/.
Add a --generate-sizes option to commit to add size information to the
commit metadata. This will be used by higher level code which wants
to determine the total size necessary for downloading.