This follows up from the previous commit; now that pull knows how to
do the efficient link() or copy for local files, we can just have
pull-local call into ostree_repo_pull().
As part of this:
- pull() can also accept a file:/// URI instead
of a remote name (since pull local supports anonymous pulls)
- pull() knows an "override-remote-name" option, since pull-local
supported writing a ref out even if there wasn't a remote with
that name
It's always been suboptimal to have both pull and pull-local; as we go
beyond the raw object data into things like deltas and summary files,
the logic to perform e.g. mirroring should only be in one place.
This will be used by Pulp's OSTree content plugin at least to perform
promotions.
When doing a pull --mirror from an archive-z2 repository into another
archive-z2 repository, currently we gunzip/checksum/gzip each content
object. The re-gzip process in particular is fairly expensive.
This does assume that the upstream content is trusted and correct.
It'd be nice in the future to do at least a CRC check, if not the full
checksum. (Could we append CRC data to the end of filez objects?)
We could also choose to only do this optimization if fetching over
TLS.
before: 1626 metadata, 20320 content objects fetched; 299634 KiB transferred in 62 seconds
after : 1626 metadata, 20320 content objects fetched; 299634 KiB transferred in 11 seconds
This is a noticeable cleanup, and fixes another big user of GFile* in
performance/security sensitive codepaths.
I'm specifically making this change because the static deltas code was
leaking temporary files, and cleaning that up nicely would be best if
we were fd relative.
You create these with something like:
ostree static-delta generate --empty --to=master
These will be automatically used during pull if no previous revision
exists in the target repo.
These work very much like the normal static deltas except they
are named just by the "to" revision. I.e:
deltas/94/f7d2dc23759dd21f9bd01e6705a8fdf98f90cad3e0109ba3f6c091c1a3774d
for a from-scratch to 94f7d2dc23759dd21f9bd01e6705a8fdf98f90cad3e0109ba3f6c091c1a3774d delta.
https://bugzilla.gnome.org/show_bug.cgi?id=721799
ostree_repo_pull_with_options() needs this, and I'd rather keep the
OstreeRemote struct definition tucked away in ostree-repo.c with its
own internal API.
Rename _ostree_fetcher_contents_membuf_sync to
ostree_fetcher_request_uri_to_membuf and drop unused argument
user_data.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
src/libostree/ostree-repo-pull.c:1676:22: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized]
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
We potentially need a lot of argument types for pull. Rather than
have a C function with tons of arguments, let's use a GVariant a{sv}
as a handy extensible (and immutable) bag of properties.
This is prepratory work for adding an option to pull to traverse
history.
https://bugzilla.gnome.org/show_bug.cgi?id=737844
Now that we have a summary file, we can use it to allow a simple:
ostree pull --mirror
To download the latest commit on every branch. Also, for a case I'm
dealing with there's only one branch, but I don't want mirror users to
have to hardcode it.
https://bugzilla.gnome.org/show_bug.cgi?id=737807
For Fedora and potentially other distributions which use globally
distributed mirrors, metalink is a popular solution to redirect
clients to a dynamic set of mirrors.
In order to make metalink work though, it needs *one* file which can
be checksummed. (Well, potentially we could explode all refs into the
metalink.xml, but that would be a lot more invasive, and a bit weird
as we'd end up checksumming the checksum file).
This commit adds a new command:
$ ostree summary -u
To regenerate the summary file. Can only be run by one process at a
time.
After that's done, the metalink can be generated based on it, and the
client fetch code will parse and load it.
https://bugzilla.gnome.org/show_bug.cgi?id=729585
Changes the pull API to allow pulling only a single directory instead
of the whole deployment. This option is utilized by the check-diff
option in rpm-ostree.
Add a new state directory to hold <checksum>.commitpartial files, so
we know that we've only downloaded partial state.
Some organizations will want to use private Certificate Authorities to
serve content to their clients. While it's possible to add the CA
to the system-wide CA store, that has two drawbacks:
1) Compromise of that cert means it can be used for other web traffic
2) All of ca-certificates is trusted
This patch allows a much stronger scenario where *only* the CAs in
tls-ca-path are used for verification from the given repository.
https://bugzilla.gnome.org/show_bug.cgi?id=726256
The generic GKeyFile error isn't quite informative enough here.
I hit this with the new compose process where we don't automatically
inject a configured remote into the generated disk images; we expect
people to add them.
https://bugzilla.gnome.org/show_bug.cgi?id=731346
There's several use cases for calling into ostree itself to do
mirroring, instead of using bare rsync. For example, it's a bit more
efficient as it doesn't require syncing the objects/ directory.
https://bugzilla.gnome.org/show_bug.cgi?id=728351
If fetching GPG-signed commits over plain HTTP, a MitM attacker can
fill up the drive of targets by simply returning an enormous stream
for the commit object.
Related to this, an attacker can also cause OSTree to perform large
memory allocations by returning enormous GVariants in the metadata.
This helps close that attack by limiting all metadata objects to 10
MiB, so the initial fetch will be truncated.
But now the attack is only slightly more difficult as the attacker
will have to return a correctly formed commit object, then return a
large stream of < 10 MiB dirmeta/dirtree objects.
https://bugzilla.gnome.org/show_bug.cgi?id=725921
For the static deltas work, we're using the already-extant internal
API to perform a HTTP fetch for optional data - static deltas are
optional.
Except that we didn't correctly unset the error if we were doing an
optional fetch and the data wasn't found.
We shouldn't g_print() from a library, particularly when the
expectation is that the client has an async progress set up.
This should fix the pull output extending the status line.
If a MITM attacker (or just network corruption) causes a temporary
downloaded object in tmp/ to be corrupted, we'll end up
continually trying to commit it, and fail.
Fix this unlinking the temp file immediately after opening it. This
will ensure that if we exit due to an error (or crash), the kernel
will clean up the space for us.
https://bugzilla.gnome.org/show_bug.cgi?id=725924
Mixing async and threads has proved to be too much for my little mind.
It has race conditions that I've tried repeatedly to fix, but failed.
The threading here was scanning metadata objects - and there are
two parts to that:
1) Physically loading them from disk
2) Parsing them
Now #1 has been partially addressed by avoiding a storm of lstat() if
we're starting from a known working state. If pull gets interrupted,
then we do need to rescan all objects. Also, we can address this with
local metadata packfiles.
The other potentially slow bit is that we recurse across the metadata,
blocking the main thread. We could ameliorate that in the future by
scheduling metadata parsing as idle "chunks".
Anyways, let's move the needle back to reliability, and readd speed
more carefully.
https://bugzilla.gnome.org/show_bug.cgi?id=706456
This has a very basic level of functionality (deltas can be generated,
and applied offline). There is only some stubbed out pull code to
fetch them via HTTP.
But, better to commit this now and improve it from a known starting
point, rather than have it languish in a branch.
Only send _IDLE messages if and only if we state transition the main
thread (from idle -> !idle or !idle -> idle). This ensures that we
don't send IDLE, then get it back, and process that when we're !idle.
This is a redesign (again) of the pull code. It is simpler and
survives 20 minutes of testing in a loop, whereas the old code would
only go from 30 seconds to 2 minutes.
The problem with the old code was that there was a race where we might
determine idle state even when there are content requests in flight
between the metadata thread and the main one.
This code majorly reworks things - there's now only one IDLE message,
sent in a circle from the main thread, through the metadata scanner,
and back to the main one.
Crucially it's only sent when the *main* thread is idle. Previously
we were looking at whether the metadata scanner is idle, but that
doesn't make a lot of sense. First let's make sure the main thread is
idle, then verify that the metadata one is.
This closes the loop because we'll have ensured we get any pending
requests.
https://bugzilla.gnome.org/show_bug.cgi?id=706456
Several APIs in libostree were moved there from the commandline code,
and have hardcoded g_print() for progress and notifications. This
isn't useful for people who want to write PackageKit backends, custom
GUIs and the like.
From what I can tell, there isn't really a winning precedent in GLib
for progress notifications.
PackageKit has the model where the source has GObject properties that
change as async ops execute, which isn't bad...but I'd like something
a bit more general where say you can have multiple outstanding async
ops and sensibly track their state.
So, OstreeAsyncProgress is basically a threadsafe property bag with a
change notification signal.
Use this new API to move the GSConsole usage (i.e. g_print()) out from
libostree/ and into ostree/.
I plan to rename all of these APIs to use the term 'loose', so that it
makes more sense after pack files are introduced. External users
should not use them; instead use _load_variant() or _read_commit().
This uses gpgv for verification against DATADIR/ostree/pubring.gpg by
default. The keyring can be overridden by specifying OSTREE_GPG_HOME.
Add a unit test for commit signing with gpg key and verifying on pull;
to implement this we ship a test GPG key generated with no password
for Ostree Tester <test@test.com>.
Change all of the existing tests to disable GPG verification.
...get_thread_default returns NULL when the thread default is also the global
default, so this only shows up when running in a thread (eg g_task_run_in_thread)
Rather than having separate write_ref calls, make clients start a
transaction, add some refs, and then commit it. While this doesn't
make it 100% atomic, it makes it easier for us to use an atomic
model, and it means we don't do as much I/O updating the summary
file and such.
https://bugzilla.gnome.org/show_bug.cgi?id=707644
An earlier version of this API acted like git in that some objects
would be staged in a temporary directory which would be then committed
in one go by moving files around. The API doesn't match most users
expectations though, as while the stage is nice as a high-level API
it isn't really suited for low-level APIs.
While the stage was removed, the APIs were never renamed. Rename
them now so that they match expectations.
https://bugzilla.gnome.org/show_bug.cgi?id=707644
There's not a good reason to write small things such as repo/config to
the filesystem, only to read them back in again. Change the
non-partial API to just return a stream, then read it into a memory
buffer.
https://bugzilla.gnome.org/show_bug.cgi?id=707157
Fold in fetch_uri to fetch_uri_utf8(), and rename the latter to
include _sync as a suffix, since it's synchronous.
Improve the status line to show when we're fetching a synchronous URI;
previously we just showed "Scanning metadata".
https://bugzilla.gnome.org/show_bug.cgi?id=707023
Use a consistent temporary filename to download uri's.
Check for downloaded files before fetching from uri.
Download to hash.part file, then copy/move to hash.done when complete.
Add argument support to setup_fake_remote_repo1 function.
Add test for pull resume.
To implement this, pass --force-range-requests into the trivial-httpd,
which will only serve half of the objects to clients at a time.
https://bugzilla.gnome.org/show_bug.cgi?id=706344
We removed support for writing "related objects" from ostree commits
in ostree git c9b61cbfee because it just
didn't work out as an idea. This also removes the API and code from
"ostree pull".
Note there was no test suite coverage.
https://bugzilla.gnome.org/show_bug.cgi?id=706342
If the admin encounters corruption and does:
$ ostree admin fsck --delete
We want them to be able to recover the objects easily from the
network; with this patch, they do:
$ ln -s dummyvalue /ostree/repo/transaction
$ ostree refs --delete remotename:branchname
$ ostree pull remotename
This patch avoids the need for the refs --delete; we might as well
force scan the commit, and with this patch we still print that it
changed.