An OSTree user noticed that `ostree fsck` would produce `missing
object` errors in the case of interrupted pulls.
It's possible to do e.g. `ostree pull --subpath=/usr/share/rpm ...`,
which gets you just that portion of the commit. The use case for this
was being able to see what changes would appear in an update before
actually downloading all of it.
(I think this would be better covered by static deltas, but those
aren't final yet, and `--subpath` predates it)
Further, `.commitpartial` is used as a successor to the `transaction`
symlink for more precise knowledge in the case where a pull was
interrupted that we needed to resume scanning.
So it makes sense for `ostree fsck` to be aware of it.
First, this is just a general continuation of the `GFile -> openat`
transition.
Second, it's preparatory work for fsck to gain awareness of partial
commits.
If a system administrator happens to type `ostree admin upgrade`
multiple times, currently that will lead to a potentially corrupted
system.
I originally attempted to do locking *internally* in `libostree`, but
that didn't work out because currently a number of the commands
perform multi-step operations that all need to be serialized. All of
the current code in `ostree admin deploy` is an example.
Therefore, allow callers to perform locking, as most of the higher
level logic is presently implemented there.
At some point, we can revisit having internal locking, but it will be
difficult. A more likely approach would be similar to Java's approach
with concurrency on iterators - a "fail fast" method.
Always request detached metadata for commit objects, even if we already
have the commit object. This ensures we fetch any post facto detached
metadata updates such as new GPG signatures.
https://bugzilla.gnome.org/748220
These fsyncs were added for what turned out to be a fairly bogus
reason; I was hitting read errors from extlinux after upgrades and out
of conservatisim tried adding fsync calls, but the *actual* problem
was that extlinux didn't support 64 bit ext4. Now that at least for
Project Atomic hosts we're just targeting grub2, we can drop these
fsync calls and rely on `syncfs()` being both faster and catching any
errors.
For some sort of crazy reason, the `sync()` system call doesn't
actually return an error code, even though from what I can tell in the
kernel it wouldn't be terribly hard to add.
Regardless though, it is better for userspace apps to use `syncfs()`
to avoid flushing filesystems unrelated to what they want to sync. In
the case of OSTree, this does matter - for example you might have a
network mount point backing your database, and we don't want to block
upgrades on syncing it.
This change is safe because we're doing syncfs in *addition* to the
previous global `sync()` (a revision from an earlier patch).
Now because OSTree only touches the `/` mount point which covers the
repository, the deployment roots (including their copy of `/etc`), as
well as `/boot`, we should at some point later be able to drop the
`sync()` call. Note that on initial system installs we do relabel
`/var` but that shouldn't happen at ostree time - any new directories
are taken care of via `systemd-tmpfiles` on boot.
This way external programs like rpm-ostree can do fd-relative
operations on the deployment directories, like inspecting the RPM
database.
Closes: https://github.com/GNOME/ostree/pull/91
Rather than returning a new OstreeRepo instance in each call to
ostree_sysroot_get_repo(), cache one internally so the same instance
is returned each time.
Emitted during a pull operation upon GPG verification (if enabled).
Applications can connect to this signal to output the verification
results if desired.
do not write directly to the summary file but use a temporary file
first. It avoids to create an empty file if "ot_util_variant_save"
fails.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
I was looking at https://bugzilla.gnome.org/show_bug.cgi?id=738954
which wants us to ensure we chown() the refs. As part of that,
I did a generic conversion to use `*at()` (which naturally gives
us more low level control so we can call `fchown` etc.
This patch also sneaks in a change to respect the repo's
`disable_fsync` flag - if fsync is not set, then we never
`fdatasync()` (unlike the `g_file_replace_contents()` default. Also
unlike it, if fsync is enabled, we *always* sync even if the file
didn't exist.
This will be used by rpm-ostree to unset the immutable bit temporarily
in order to do package layering. We could add an API to deploy a tree
without the immutable bit, but this is simpler.
rpm-ostree currently uses ostree_repo_checkout_tree(), which as a side
effect will use the uncompressed objects cache by default. This is
rather annoying if you're using rpm-ostree on a server-side
repository, because if you then rsync the repo, you'll be syncing out
the uncompressed objects unless you exclude them.
We added the ability to disable the uncompressed cache in the
repository config to fix this, but it's better to allow application
control over this. The uncompressed cache will in some future version
become opt in as well.
This new API further:
- Drops the `GFile` usage in favor of `openat` APIs
- Improves ergonomics by avoiding callers having to query the source
`GFileInfo` (and carry around a copy of `OSTREE_GIO_FAST_QUERYINFO`)
- Has a more extensible options structure
Per the comment, I rather crudely have the `ostree checkout` builtin
call both APIs to ensure some testing coverage.
However, I'd like to in the future have easier-to-set-up testing code
that calls `libtest.sh` to set up dummy data.
It's valid for the remote server to say 200 OK and give us the entire
file instead of a 206 Partial Content, and in that case we should blow
away the previous cached data, rather than blindly appending to it and
thus creating multiple copies of the data inside the file.
This problem primarily occurs when we do have the complete file, and
we're interrupted, then try again, where the new process didn't record
the download was already complete. We do a range request for bytes
past the end, and some web servers (e.g. Akamai) will return 200 OK
with the whole content again, rather than a 416 Requested Range Not
Satisfiable.
Thus we could also fix this by saner caching strategy - since we know
the file is complete, rename it again to $checksum.done or something
before it's processed. (Or really, rework how we do caching more
intelligently in general).
This fixes the issue that interrupted pulls failed with such
webservers, although repeated attempts would eventually succeed
because we'd unlink files that failed to pull.
Related: https://bugzilla.redhat.com/show_bug.cgi?id=1207292
If the starting index is beyond the end of the list, it's a programming
error. Previously, the code was trying to raise a runtime error, but
actually causing a segfault.
This was detected by test code in test-mutable-tree.c, which is removed
in this commit because it should now not be possible to crash here.
https://bugzilla.gnome.org/747032
Uses OstreeGpgVerifyResult to catch duplicate signatures.
If the commit has already been signed with the given GPG key ID, fail
with a G_IO_ERROR_EXISTS error code.
Wrappers a referenced gpgme_verify_result_t so detailed verify results
can be examined independently of executing a verify operation.
_ostree_gpg_verifier_check_signature() now returns this object instead
of a single valid/invalid boolean, but the idea is for OstreeRepo to also
return this object for commit signature verification so it can be utilized
at the CLI layer (and possibly by other programs).
Similar to c2b01ad. For some reason I was thinking the commit data
still needed to be written to disk prior to verifying, but it's just
another artifact of spawning gpgv2 (predates using GPGME).
Makes for a nice cleanup in fetch_metadata_to_verify_delta_superblock()
as well.
In anticipation of API enhancements for GPG signature verification, which
would otherwise require a non-functional stub version were GPGME excluded.
GPGME is a pretty lightweight dependency, and the motivation to exclude
it is not clear.
The signature data is in memory to begin with, so there's no need to
write it to disk only to immediately read it back.
Also, because the GPGME multi-keyring workaround is somewhat expensive
to setup and teardown, concatenate all signatures into a single GBytes
so _ostree_gpg_verifier_check_signature() is only called once. We're
currently only looking for one valid signature anyway.
This sets the stage for more advanced signature management.
(Also, talking to GPG over pipes sucks.)
Previously we were spawning gpgv2 with a bunch of --keyring options
for /usr/share/ostree/trusted.gpg.d/ and whatever other keyring files
were explicitly added. GPGME has no public API for multiple keyrings,
so we work around the issue by setting up a temp directory to serve as
a fake "home" directory for the crypto engine and then concatenate all
the keyring files into a single public keyring (pubring.gpg).
Unfortunately at present we do this on every signature verification.
There's a desire to cache this concatenation, but the problem is the
user may be unprivileged. So it seems the cache would have to be per
user under $XDG_CACHE_HOME, which OSTree doesn't otherwise use. I'm
open to suggestions.
We do at least clean up the temp directory when finished, and I have
further API changes planned to OstreeGpgVerifier to help mitigate the
performance impact.
it will be used by the next patch that adds "--generate-static-delta"
to the commit command.
As part of the patch, update the list of supported "params" in the
documentation.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
g_variant_builder_add() does not replace identical keys in a VARDICT
variant, so signing a commit multiple times results in multiple copies
of "ostree.gpgsigs" in the metadata. And since g_variant_lookup_value()
stops on the first match, subsequent signatures have no net effect.
Instead of GVariantBuilder use GVariantDict, which behaves more like a
hash table.
Checkout was one of the first complex code paths I tried to convert to
*at(). I ended up keeping both, because I hit the "xattrs for a
symlink" problem. Later, Florian gave me a workaround, and we started
using it here, but the GFile * parameters weren't deleted. They're
not used, so do so now.
Starting down the path of not using libgsystem. The main win here
will be code sharing between ostree/rpm-ostree as well as going down
the path of not using GFile * for local files.
Convenience feature to avoid having to pass --repo options repeatedly.
Before falling back to the default system repository path, check for a
repository path defined by the OSTREE_REPO environment variable.
We already set all file mtimes to 0 so that they are constant
over all checkouts, and can be made constant with a known value from
the system where the ostree was created.
However, this was not happening for directories. Zero their mtimes too.
This is important for shipping a fontconfig cache in the ostree;
the fontconfig cache files embed a directory mtime.
The previous diff algorithm was file tree based, and only looked
at modified files that lived at the same path.
However, components like the Linux kernel have versioned
subdirectories, e.g. /usr/lib/modules/$kver/.../ext4.ko. We want to
be able to detect these "modified renames" so that we can compute
diffs (rollsum, bsdiff).
This does an rsync-style prepared delta basically. On my test data,
it shaves ~6MB of uncompressed data. Not a huge amount, but I expect
this to be more useful for things like binaries which embed data, etc.
There's still some silliness here, but there is now only one opcode
open-splice-and-close, that writes a single chunk from the payload.
This is really all we need for metadata, and small content objects are
also fine with this.
We get some deduplication between content objects by creating a
dictionary for (uid,gid,mode) tuples and xattrs.
This still keeps the operation/payload code in, so we could do
rollsums in a future update easily.
I was hitting a bug in libguestfs/guestmount/FUSE where it blew up
with EINVAL on directories containing lots of files (more than
32000?). We really want to use prefixed subdirs just like the real
objects/ directory does.
This allows us to share more code between the paths, is more
efficient, etc.
gnome-continuous uses the ostree_repo_scan_hardlinks() mode to
avoid re-checksumming everything. However, when I ported the commit
code to use openat() and friends, this optimization was lost.
Re add it. The difference is about 15s versus 5 minutes.
This follows up from the previous commit; now that pull knows how to
do the efficient link() or copy for local files, we can just have
pull-local call into ostree_repo_pull().
As part of this:
- pull() can also accept a file:/// URI instead
of a remote name (since pull local supports anonymous pulls)
- pull() knows an "override-remote-name" option, since pull-local
supported writing a ref out even if there wasn't a remote with
that name
It's always been suboptimal to have both pull and pull-local; as we go
beyond the raw object data into things like deltas and summary files,
the logic to perform e.g. mirroring should only be in one place.
This will be used by Pulp's OSTree content plugin at least to perform
promotions.
When doing a pull --mirror from an archive-z2 repository into another
archive-z2 repository, currently we gunzip/checksum/gzip each content
object. The re-gzip process in particular is fairly expensive.
This does assume that the upstream content is trusted and correct.
It'd be nice in the future to do at least a CRC check, if not the full
checksum. (Could we append CRC data to the end of filez objects?)
We could also choose to only do this optimization if fetching over
TLS.
before: 1626 metadata, 20320 content objects fetched; 299634 KiB transferred in 62 seconds
after : 1626 metadata, 20320 content objects fetched; 299634 KiB transferred in 11 seconds
For future delta work where we do more interesting things than just
"tar of new objects", this lays the groundwork for doing streaming
writes into content objects.
It's also more efficient, as we avoid many intermediate allocations
and virtual calls. Just a single `g_output_stream_write_all` for the
splice case.
Conflicts:
src/libostree/ostree-repo-private.h
src/libostree/ostree-repo-static-delta-processing.c
We could just make everything relative to this, but the objects/ and
tmp/ are accessed very often, so I think it's worth holding individual
fds.
This fd can cover everything else: refs, deltas, etc.
Do not write directly to objects/ but maintain pulled files under tmp/
with a "tmpobject-$CHECKSUM.$OBJTYPE" name until they are syncfs'ed to
disk.
Move them under objects/ at ostree_repo_commit_transaction cleanup
time.
Before (test done on a local network):
$ LANG=C sudo time ./ostree --repo=repo pull origin master
0 metadata, 3 content objects fetched; 83820 KiB; 4 delta parts
fetched, transferred in 417 seconds
16.42user 6.73system 6:57.19elapsed 5%CPU (0avgtext+0avgdata
248428maxresident)k
24inputs+794472outputs (0major+233968minor)pagefaults 0swaps
After:
$ LANG=C sudo time ./ostree --repo=repo pull origin master
0 metadata, 3 content objects fetched; 83820 KiB; 4 delta parts
fetched, transferred in 9 seconds
14.70user 2.87system 0:09.99elapsed 175%CPU (0avgtext+0avgdata
256168maxresident)k
0inputs+794472outputs (0major+164333minor)pagefaults 0swaps
https://bugzilla.gnome.org/show_bug.cgi?id=728065
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
subscription-manager has a daemon that runs in a confined domain,
and it doesn't have permission to write usr_t, which is the default
label of /ostree/deploy/$osname/deploy.
A better long term fix is probably to move the origin file into the
deployment root as /etc/ostree/origin.conf or so.
In the meantime, let's ensure the .origin files are labeled as
configuration.
If an object already existed and we somehow tried to pull it, the
caller would still expect a returned checksum.
This appears to happen with static deltas for some reason; we might be
including duplicate metadata objects. Regardless, this is a bug that
should be fixed.
We have a chain of checksums from the root up until here. While doing
checksums of the objects individually would be a good redundancy check
for test cases and the like, when doing a pull there's no good reason
to burn cycles on SHA256.
This caused deadlocks and/or EMFILE due to the interaction between
threads and fds. What we really want here is a better pull-based
model for parsing content objects.
Another idea would be to change static deltas so that content objects
have a special opcode that includes their metadata first, and then do
rollsums etc. only over actual content.
This will avoid too many open files at the same time that could cause
an EMFILE error.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit bc092b06f0e34e93f7d6102957bf55fd7ffd1b9e)
This is a noticeable cleanup, and fixes another big user of GFile* in
performance/security sensitive codepaths.
I'm specifically making this change because the static deltas code was
leaking temporary files, and cleaning that up nicely would be best if
we were fd relative.
This will be used for a later change to use openat() for the fetching
code. Note that we drop the code to use mmap() - it was an attempt to
avoid keeping a fd open, but we do correctly close anyways.
You create these with something like:
ostree static-delta generate --empty --to=master
These will be automatically used during pull if no previous revision
exists in the target repo.
These work very much like the normal static deltas except they
are named just by the "to" revision. I.e:
deltas/94/f7d2dc23759dd21f9bd01e6705a8fdf98f90cad3e0109ba3f6c091c1a3774d
for a from-scratch to 94f7d2dc23759dd21f9bd01e6705a8fdf98f90cad3e0109ba3f6c091c1a3774d delta.
https://bugzilla.gnome.org/show_bug.cgi?id=721799
The current layout uses a prefix of two bytes as the initial dir
and a second directory inside that with the superblock. This
updates the list code to handle that.
https://bugzilla.gnome.org/show_bug.cgi?id=721799
Regression from 86764dbf00
This function is kind of fiendish now that we have 3 cases, each of
which want to be optimized somewhat to only load what's necessary
(e.g. don't open the file if we don't have an output for stream
requested).
Clean things up so that BARE_USER and BARE are separate conditionals
that share as much as possible, and fix the bug that asserted we
were in BARE mode.
I tested this by running test-basic-user.sh by hand.
Some use cases for checkouts don't need to fsync during checkout.
Installer programs for example will just do a global fsync at the end.
In the future, the default "ostree admin" core could also be
rearchitected to only do a transaction commit right before reboot, and
do the fsync then.
https://bugzilla.gnome.org/show_bug.cgi?id=742482
This is just an efficiency optimization. We're getting fairly close
to all of the hot code paths using `*at()`.
Note that we end up maintaining a half-duplicate code path set here,
because we still need to support commits from an arbitrary GFile *,
which in a possible common case is an OSTree commit.
I think it's worth it though.
rpm-ostree had code to check for this, which didn't actually work.
I don't see a no backwards compatibility concern in changing this, as
it's unlikely a caller would try to sensibly disambiguate FAILED.
We were already using openat() for the contents, but not the xattrs.
Now that libgsystem 2014.3 has gs_fd_get_all_xattrs(), make use of it.
Clean things up a bit so we only open the fd once.