It's valid for the remote server to say 200 OK and give us the entire
file instead of a 206 Partial Content, and in that case we should blow
away the previous cached data, rather than blindly appending to it and
thus creating multiple copies of the data inside the file.
This problem primarily occurs when we do have the complete file, and
we're interrupted, then try again, where the new process didn't record
the download was already complete. We do a range request for bytes
past the end, and some web servers (e.g. Akamai) will return 200 OK
with the whole content again, rather than a 416 Requested Range Not
Satisfiable.
Thus we could also fix this by saner caching strategy - since we know
the file is complete, rename it again to $checksum.done or something
before it's processed. (Or really, rework how we do caching more
intelligently in general).
This fixes the issue that interrupted pulls failed with such
webservers, although repeated attempts would eventually succeed
because we'd unlink files that failed to pull.
Related: https://bugzilla.redhat.com/show_bug.cgi?id=1207292
This is a noticeable cleanup, and fixes another big user of GFile* in
performance/security sensitive codepaths.
I'm specifically making this change because the static deltas code was
leaking temporary files, and cleaning that up nicely would be best if
we were fd relative.
Make _ostree_fetcher_request_uri_with_partial_async and
ostree_fetcher_stream_uri_async simple wrapper around the same
function, all the requests are created in the same place now.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Rename _ostree_fetcher_contents_membuf_sync to
ostree_fetcher_request_uri_to_membuf and drop unused argument
user_data.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
_ostree_fetcher_query_state_text() and_ostree_fetcher_get_n_requests()
have no callers, so remove them.
If they will be needed, they can be easily copied back from the git
history.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Otherwise, we're potentially holding up subsequent requests.
I was hitting this when testing the metalink code, where we want to
continue doing more fetches after hitting a 404.
https://bugzilla.gnome.org/show_bug.cgi?id=729585
Some organizations will want to use private Certificate Authorities to
serve content to their clients. While it's possible to add the CA
to the system-wide CA store, that has two drawbacks:
1) Compromise of that cert means it can be used for other web traffic
2) All of ca-certificates is trusted
This patch allows a much stronger scenario where *only* the CAs in
tls-ca-path are used for verification from the given repository.
https://bugzilla.gnome.org/show_bug.cgi?id=726256
If fetching GPG-signed commits over plain HTTP, a MitM attacker can
fill up the drive of targets by simply returning an enormous stream
for the commit object.
Related to this, an attacker can also cause OSTree to perform large
memory allocations by returning enormous GVariants in the metadata.
This helps close that attack by limiting all metadata objects to 10
MiB, so the initial fetch will be truncated.
But now the attack is only slightly more difficult as the attacker
will have to return a correctly formed commit object, then return a
large stream of < 10 MiB dirmeta/dirtree objects.
https://bugzilla.gnome.org/show_bug.cgi?id=725921
We're seeing some hangs while ostree is fetching updates.
I imagine the fact that SoupSessionAsync has no timeout by default
could be the cause of this.
Set timeout values to 60 seconds, which is the default for the new
SoupSession API which we may switch to later.
https://bugzilla.gnome.org/show_bug.cgi?id=724310
Previously the progress meter would bump in large chunks after we
completed a download. Instead, poll in progress files via fstat() for
their size, and add those to the running total.
I was getting hangs in the test suite, and looking at the previous
commit, we were calling the async completion functions out of the
finalizer for the URI, which is weird. I didn't analyze what's going
wrong, but what we really should be doing is processing our internal
queue after we've downloaded a file, and the request is about to be
finalized.
I suspect doing queue management from the finalizer created a circular
reference type situation.
This patch deduplicates the queue processing bits too.
https://bugzilla.gnome.org/show_bug.cgi?id=708126
On a large ostree repository pulling over http slows to a crawl. Pulling
from localhost results in:
5944 metadata, 63734 content objects fetched; 850509 KiB transferred in
1106 seconds
In other words about 800KiB/s. Some profiling shows that essentially
all of the CPU goes into libsoup doing its request bookkeeping instead
of into the actual downloading.
Adding a simple queue to limit to number of active request sent into
libsoup makes for a dramatic improvement:
5944 metadata, 63734 content objects fetched; 850509 KiB transferred
in 89 seconds
So around 9450 KiB/s.
https://bugzilla.gnome.org/show_bug.cgi?id=708126
There's not a good reason to write small things such as repo/config to
the filesystem, only to read them back in again. Change the
non-partial API to just return a stream, then read it into a memory
buffer.
https://bugzilla.gnome.org/show_bug.cgi?id=707157
I think originally we had the .part/.done separation because we were
trying to support partial downloads of files like repo/config and
repo/refs.
But now that the http server configuration won't give us partial
results, we don't need to support caching those files between runs.
And thus, there's no reason to have the .part/.done and do the dance
with renaming them.
When fetching objects/ and other things that use _with_async, we
continue to use _append_to(), and if the returned range tells us we
have all the bytes, then we hand the full file over to the caller.
Don't attempt to shortcut in the case where the last run told us we
already have the object; the object fetcher code will not make a
request.
While we're here, also clean up use of GError and consistently use the
cancellable from the pending.
https://bugzilla.gnome.org/show_bug.cgi?id=707157
Use a consistent temporary filename to download uri's.
Check for downloaded files before fetching from uri.
Download to hash.part file, then copy/move to hash.done when complete.
Add argument support to setup_fake_remote_repo1 function.
Add test for pull resume.
To implement this, pass --force-range-requests into the trivial-httpd,
which will only serve half of the objects to clients at a time.
https://bugzilla.gnome.org/show_bug.cgi?id=706344