Commit Graph

265 Commits

Author SHA1 Message Date
Colin Walters dc2a7de217 lib/commit: Try checksum+hardlink for untrusted local same-uid repos
This mainly helps flatpak for enabling a hardlink-able local pull
during deploy in the --system case.  We assume the files are immutable
when owned by the same uid.

See https://github.com/ostreedev/ostree/issues/1723
and https://github.com/flatpak/flatpak/pull/2342

Closes: #1776
Approved by: uajain
2018-12-04 20:38:41 +00:00
Jonathan Lebon 244d9a7ec1 lib/commit: Copy user.ostreemeta only for bare-user
When falling back to copying objects when importing them into a
bare-user repo, we only actually need to transfer over the
`user.ostreemeta` xattr.

This allows the destination repo to be on a separate filesystem that
might not even support `security.selinux`. (I hit this while importing
over virtio-9p).

Closes: #1771
Approved by: cgwalters
2018-11-13 15:15:13 +00:00
Jonathan Lebon 8eac5be030 lib/commit: Add devino_cache_hits to txn stats
I found this useful while hacking on rpm-ostree but I think it might be
useful enough to upstream. This stat is really helpful for validating
that a pipeline is hitting the devino cache sweet spot.

Closes: #1772
Approved by: cgwalters
2018-11-05 14:08:54 +00:00
Dan Nicholson 43d9cac4fc lib/commit: Don't chown objects to repo target owner
The idea is that if the process is running as root, it can change
ownership of newly written files to match the owner of the repo.
Unfortunately, it currently applies in the other direction, too - a
non-root user writing to a root owned repository. If the repo is
writable by the user but owned by root, it can still create files and
directories there, but it can't change ownership of them.

This feature comes from
https://bugzilla.gnome.org/show_bug.cgi?id=738954. As it turns out, this
feature was never completed. It only works on content objects and not
metadata objects, refs, deltas, summaries, etc. Rather than try to fix
all of those, remove the feature until someone has interest in
completing it.

Closes: #1754
Approved by: cgwalters
2018-10-12 12:34:57 +00:00
Colin Walters 2c55bc6997 Only verify OSTREE_MAX_METADATA_SIZE for HTTP fetches
There are use cases for libostree as a local content store
for content derived or delivered via other mechanisms (e.g. OCI
images, RPMs, etc.).  rpm-ostree today imports RPMs into OSTree
branches, and puts the RPM header value as commit metadata.
Some of these can be quite large because the header includes
permissions for each file.  Similarly, some OCI metadata is large.

Since there's no security issues with this, support committing
such content.

We still by default limit the size of metadata fetches, although
for good measure we make this configurable too via a new
`max-metadata-size` value.

Closes: https://github.com/ostreedev/ostree/issues/1721

Closes: #1744
Approved by: jlebon
2018-10-01 13:23:50 +00:00
Colin Walters c141fe610b lib/commit: Don't copy xattrs for metadata objects
Copying the xattrs on metadata objects is wrong in general, we
don't "own" them.  Notably this would fail in the situation of
doing a pull from e.g. a `bare-user` source to a destination
that was on a different mount point (so we couldn't hardlink),
and the source had e.g. a `security.selinux` attribute.

Closes: #1734

Closes: #1736
Approved by: jlebon
2018-09-25 14:49:22 +00:00
Umang Jain a0937b6cf0 lib/repo: Separate min-free-space-* calculation from transaction codepath
Earlier, the actual reserved space (in blocks) were calculated inside the
transaction codepath ostree_repo_prepare_transaction(). However, while
reworking on ostree_repo_get_min_free_space_bytes() API, it was realized that
this calculation can be done independently from the transaction's codepaths, hence
enabling the usage for ostree_repo_get_min_free_space_bytes() API irrespective
of whether there is an ongoing transaction or not.

https://github.com/ostreedev/ostree/issues/1720

Closes: #1722
Approved by: pwithnall
2018-09-21 15:09:12 +00:00
Umang Jain 3814d075cb lib/repo: Ensure min-free-space* config value doesn't overflow
when converted to bytes

In a subsequent commit, we add a public API to read the value of
min-free-space-* value in bytes. The value for free space check
is enforced in terms of block size instead of bytes. Therefore,
for consistency we check while preparing the transaction that the
value doesn't overflow when converted to bytes.

https://phabricator.endlessm.com/T23694

Closes: #1715
Approved by: cgwalters
2018-09-04 21:31:33 +00:00
Jonathan Lebon 521e0ec3ac lib/commit: Only auto-update summary if refs were written
Closes: #1693
Approved by: mwleeds
2018-08-01 19:59:07 +00:00
Jonathan Lebon 786ee6bdec lib/config: Rename change-update-summary to auto-...
Mildly bikeshed, though I find the name `auto-update-summary` to be
easier to grok than `change-update-summary`. I think it's because it can
be read as "verb-verb-noun" rather than "noun-verb-noun".

Closes: #1693
Approved by: mwleeds
2018-08-01 19:59:07 +00:00
Matthew Leeds 6869bada49 config: Add a core/change-update-summary option
This commits adds and implements a boolean repo config option called
"change-update-summary" which updates the summary file every time a ref
changes (additions, updates, and deletions).

The main impetus for this feature is that the `ostree create-usb` and
`flatpak create-usb` commands depend on the repo summary being up to
date. On the command line you can work around this by asking the user to
run `ostree summary --update` but in the case of GNOME Software calling
out to `flatpak create-usb` this wouldn't work because it's running as a
user and the repo is owned by root. That strategy also means flatpak
can't update the repo metadata refs for fear of invalidating the
summary.

Another use case for this relates to LAN updates. Specifically, the
component of eos-updater that generates DNS-SD records advertising ostree
refs depends on the repo summary being up to date.

Since ostree_repo_regenerate_summary() now takes an exclusive lock, this
should be safe to enable. However it's not enabled by default because of
the performance cost, and because it's more useful on clients than
servers (which likely have another mechanism for updating the summary).

Fixes https://github.com/ostreedev/ostree/issues/1664

Closes: #1681
Approved by: jlebon
2018-07-30 17:19:12 +00:00
Jonathan Lebon 9482922e5e lib: Check for NULL pointers in some more places
In `write_metadata_object()`, make sure when creating tombstone commits
that we're actually passed an expected checksum to use.

In `write_dir_entry_to_mtree_internal()`, sanity check that `dfd_iter`
is indeed not `NULL` before trying to dereference it.

Discovered by Coverity.

Closes: #1692
Approved by: cgwalters
2018-07-26 21:01:19 +00:00
Matthew Leeds be07c04e63 lib/repo-commit: Fix min-free-space error message
Since min_free_space_size_mb is considered before min_free_space_percent
in min_free_space_calculate_reserved_blocks(), it has to be considered
first when generating the error message in order for it to be accurate.

Closes: #1691
Approved by: jlebon
2018-07-25 13:16:18 +00:00
William Manley c7b12a8730 ostree repo commit: Speed up composing trees with `--tree=ref`
Running `ostree commit --tree=ref=a --tree=dir=b` involves reading the
whole of a into an `OstreeMutableTree` before composing `b` on top.  This
is inefficient if a is a complete rootfs and b is just touching one file.
We process O(size of a + size of b) directories rather than
O(number of touched dirs).

This commit makes `ostree commit` more efficient at composing multiple
directories together.  With `ostree_mutable_tree_fill_empty_from_dirtree`
we create a lazy `OstreeMutableTree` which only reads the underlying
information from disk when needed.  We don't need to read all the
subdirectories just to get the checksum of a tree already checked into the
repo.

This provides great speedups when composing a rootfs out of multiple other
rootfs as we do in our build system.  We compose multiple containers
together with:

    ostree commit --tree=ref=base-rootfs --tree=ref=container1 --tree=ref=container2

and it is much faster now.

As a test I ran

    time ostree --repo=... commit --orphan --tree=ref=big-rootfs --tree=dir=modified_etc

Where modified_etc contained a modified sudoers file under /etc.  I used
`strace` to count syscalls and I seperatly took timing measurements.  To
test with a cold cache I ran

    sync && echo 3 | sudo tee /proc/sys/vm/drop_caches

Results:

|                      | Before | After |
| -------------------- | ------ | ----- |
| Time (cold cache)    |   8.1s | 0.12s |
| Time (warm cache)    |   3.7s | 0.08s |
| `openat` calls       |  53589 |   246 |
| `fgetxattr` calls    |  78916 |     0 |

I'm not sure if this will have some negative interaction with the
`_ostree_repo_commit_modifier_apply` which is short-circuited here.  I
think it was disabled for `--tree=ref=` anyway, but I'm not certain.  All
the tests pass anyway.

I originally implemented this in terms of the `OstreeRepoFile` APIs, but
it was *way* less efficient, opening and reading many files unnecessarily.

Closes: #1643
Approved by: cgwalters
2018-07-09 13:10:51 +00:00
Umang Jain 4c023a9585 lib/repo-commit: Factor out min-free-space-size error reporting
Improves code readability.

Closes: #1671
Approved by: jlebon
2018-07-06 19:59:10 +00:00
Umang Jain 0c8b86ea09 lib/repo: Minor fixes around min-free-space
Summary:
* Remove a useless if condition in prepare_transaction()
* Fix glnx_throw error propagation
* Integer overflow check while parsing min-free-space-size config
* Documentation fixes

Closes: #1663
Approved by: jlebon
2018-07-03 12:59:26 +00:00
Philip Withnall abff8b8cfa lib/repo-commit: Abort a transaction if preparing it fails
If ostree_repo_prepare_transaction() fails, we should reset the
repository’s state so that the failed call was essentially idempotent.
Do that by calling ostree_repo_abort_transaction() on the failure path.

Typically, the way for preparing a transaction to fail is for its
GCancellable to be triggered, rather than because any of the operations
involved in preparing a transaction are particularly failure prone.

Signed-off-by: Philip Withnall <withnall@endlessm.com>

Closes: #1647
Approved by: cgwalters
2018-06-29 19:32:44 +00:00
Umang Jain d686056254 lib/repo: Cleanup current boot's staging dir min-free-space-* checks are hit
min-free-space-* act as a gating condition whether to we want hold onto caches in
repo/tmp. If it is found that the free-disk space is going below this threshold,
we flag it as an error and cleanup current boot's staging directory.

Closes: #1602
Approved by: jlebon
2018-06-27 19:02:02 +00:00
Umang Jain 1074668ede lib/repo: cleanup_tmpdir should be executed after releasing lock file
Here's a subtle bug in abort_transaction():
One of the policies of cleaning up is to skip the current boot's staging
directory. The responsible function for this is cleanup_tmpdir() which tries
to lock each of the tmpdir before deleting it. When it comes to the current
boot's staging dir, it tries to lock the directory(again!) but fails as there
is already a lockfile present. Just because the current boot's staging dir was
meant to be skipped, the bug never surfaced up and wasn't catastrohpic.

if (!_ostree_repo_try_lock_tmpdir (dfd, path, &lockfile, &did_lock, error))
  return FALSE;
if (!did_lock)
  return TRUE; /* Note early return */
...
if (g_str_has_prefix (path, self->stagedir_prefix))
  return TRUE; /* Note early return */

The actual check for skipping staging dir for current boot was never reached
because the function returned at did_lock failure.

Therefore, execute cleanup_tmpdir() after releasing the lockfile in
abort_transaction() so that cleanup_tmpdir gets a chance to lock current boot's
staging directory and succeed.

Closes: #1602
Approved by: jlebon
2018-06-27 19:02:02 +00:00
Umang Jain 095376efa2 lib/repo: Enforce min-free-space-* size check for regfiles in deltas
During the pull, there is an explicit check for free space on disk
vs. the size of uncompressed delta; But while writing the new content
objects that are generated, they have to honor min-free-space-* checks
too. We enforce this check in _bare_content_commit as that is where
we can know the final size of the new content object.

Closes: #1614
Approved by: jlebon
2018-06-22 21:01:56 +00:00
Colin Walters 1174d9f5ba lib/repo: Fix 32 bit format string error 2018-06-21 11:33:23 -04:00
Colin Walters 5e9d382811 lib/repo: Do free space math under lock in error path
We were referencing the txn bits outside of the lock in the error
path. Generally shouldn't matter, but e.g. Rust wouldn't let us do this, and
race detector tooling will warn about it.

Closes: #1632
Approved by: jlebon
2018-06-19 18:29:31 +00:00
Colin Walters acab2c1ac6 lib/repo: Rename free_space_size variable to free_space_mb
I generally like having variables include their units where applicable;
timer variables having `_secs` or `_ms`, etc.

Closes: #1632
Approved by: jlebon
2018-06-19 18:29:31 +00:00
Philip Withnall 2d2f218669 lib/repo-commit: Delay propagation of errors from abort_transaction()
If there’s a problem while aborting a transaction, store the error but
don’t report it until the end of the function — do a best effort at
clearing the rest of the transaction state first (since most of it
cannot fail).

If cleanup_tmpdir() fails (which, arguably, should not be a
showstopper), this allows a caller to recover and start a new
transaction in future.

Signed-off-by: Philip Withnall <withnall@endlessm.com>

Closes: #1626
Approved by: jlebon
2018-06-14 17:13:43 +00:00
Umang Jain 31809d32f2 lib/repo: Add min-free-space-size option
Similar to min-free-space-percent but it supports specific sizes
(in MB, GB or TB). Also, making min-free-space-percent and -size
mutually exclusive.

min-free-space-percent does not give a fine tuning of the free disk
space that a user might decide to keep. It can translate to very large
size (e.g. 1% = ~10GB on 1TB HDD) or very small (e.g. 1% = ~330MB on 32GB
system like Endless devices). Hence, it makes sense to introduce a config
option to honor specific size as per the user.

Closes: #1616
Approved by: jlebon
2018-06-13 18:57:37 +00:00
Matthew Leeds 8fbf19c9f5 Make P2P API public (no longer experimental)
Currently the API that allows P2P operations (e.g. pulling an ostree ref
from a LAN or USB source) is hidden behind the configure flag
--enable-experimental-api. This commit makes the API public and makes
that flag essentially a no-op (leaving it in place in case we want to
use it again in the future). The P2P API has been tested over the last
several months and proven to work.

This means that since we're no longer using the "experimental" feature
flag, P2P builds of Flatpak will fail when using versions of OSTree from
this commit onwards, until Flatpak is patched in the near future. If you
want to build Flatpak < 0.11.8 with P2P enabled and link against OSTree
2018.6, you'll have to patch Flatpak.  However, since Flatpak won't yet
have a hard dependency on OSTree 2018.6, it needs a new way to determine
if the P2P API in OSTree is available, so this commit adds a "p2p"
feature flag. This way the feature set is more semantically correct than
if we had continued to use the "experimental" feature flag.

In addition to making the P2P API public, this commit makes the P2P unit
tests run by default, removes the f27-experimental CI instance that's no
longer needed, changes a few man pages to reflect the changes, and
updates the bash completion script to accept the new commands and
options.

Closes: #1596
Approved by: cgwalters
2018-06-04 19:20:10 +00:00
Jonathan Lebon 589e97dc60 lib/commit: Fix function name typo in docstring
Closes: #1575
Approved by: cgwalters
2018-05-04 14:51:07 +00:00
Colin Walters 9f8e2b8862 lib: Use `Locking:` term in docs
This is easier to `git grep` etc. versus ad-hoc English.  Although
we still have some English for the prepare_transaction/commit which
acquire/release in separate phases.

Closes: #1572
Approved by: jlebon
2018-05-02 17:28:29 +00:00
Colin Walters 8c1542134c lib/repo: Enable locking by default, but drop external API
The code has been sitting around for a while but since I disabled
it by default, I doubt anyone is really using it or relying on it.

This patch and turns on locking by default, and also drops the
API which was only public in the experimental API builds.
Conceptually these are two distinct things, and we
may actually want to split up the patches.

I don't think this will break anyone, but it's hard to say for sure.
It's also going to be hard to find out until we actually release
I suspect...

But anyone who is broken should be able to add `locking=false` into
their repo config.  On the flip side Endless has been shipping with
this enabled and it is reported to help.

The reason to drop the APIs: I'm a bit concerned about the interactions over time
between libostree's use of the API and any apps that start using it.
For example, if an app specifies a SHARED lock in their code, then
later internally we decide to temporarily grab an `EXCLUSIVE`, but the
app had a second thread/process that was `EXCLUSIVE` already, and
that process was waiting on the first bit of code, then we could
deadlock. I can't think of a real world situation where this would happen
yet though.

We are likely to in the future have say `fsck` take an external lock,
`checkout` grab a shared one, etc.

Closes: #1555
Approved by: jlebon
2018-04-30 17:24:51 +00:00
Giuseppe Scrivano cdaf7cd838 commit, payload-reflink: do not write to the parent repo
reintroduce the feature that was reverted with commit:

28c7bc6d0e

Differently than the original implementation, now we don't attempt any
test for reflinks support on the parent repository, since the test
requires write access to the repository.

Additionally, also check that the two repositories are on the same
device before attempting any reflink.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Closes: #1525
Approved by: cgwalters
2018-04-13 21:52:53 +00:00
Owen W. Taylor 9199237b01 Don't scan uncompressed_objects_dir if it doesn't exist
A newly created archive-mode repository won't have a uncompressed-objects-cache
directory, and uncompressed_objects_dir is -1 to flag that. The special meaning of
-1 meaning "cwd" for libglnx means that the current directory was scanned as
if it was an objects directory, producing unexpected results, especially if there
were any two-letter files/subdirs in the current directory.

Closes: #1537
Approved by: jlebon
2018-04-12 13:53:15 +00:00
Matthew Leeds 005d25cc75 lib: Fix a few comments
Closes: #1526
Approved by: cgwalters
2018-03-29 22:01:51 +00:00
Alexander Larsson 28c7bc6d0e Don't write to parent repo
In _try_clone_from_payload_link, don't try to do the clone in the
parent repo, because we don't want to modify that. parent repos are
typically used when you want a shared, immutable base.

For example in flatpak, the parent repo is the system repo which you
don't have write access to, so any modification to it will fail with
EACCES, making it impossible to install via the system helper.

Closes: #1524
Approved by: cgwalters
2018-03-29 14:11:38 +00:00
Matthew Leeds 2be4631738 lib/commit: Fix a memory leak of OtChecksum
Closes: #1521
Approved by: jlebon
2018-03-29 13:45:26 +00:00
Joaquim Rocha 591f8a68b1 pull: Ignore the cancellable when aborting a transaction
In ostree_repo_abort_transaction, if we pass a cancellable and it gets
canceled, then the function may fail to fully clean up the transaction
state. This was happening e.g. when the ostree_repo_pull_with_options
call got cancelled.

To fix this, as suggested by Colin Walters, we set the passed
cancellable as NULL, in order for it to be ignored.

https://github.com/ostreedev/ostree/issues/1491

Closes: #1492
Approved by: jlebon
2018-03-12 19:18:57 +00:00
Giuseppe Scrivano 127d8bb846 commit: add logic for .payload-link
When a new object is added to the repository, create a
$PAYLOAD-SHA256.payload-link symlink file as well.  The target of the
symlink is the checksum of the object that was added the repository.

Whenever we add a new object file, in addition to lookup if the file is
already present with the same checksum we also check if an object with
the same payload is in the repository.

If a file with the same payload is already present in the repository, we
copy it with `glnx_regfile_copy_bytes` that internally attempts to
create a reflink (ioctl (..., FICLONE, ..)) to the target file if the
file system supports it.  This enables to have objects that share the
payload but have a different inode and xattrs.

By default the payload-link-threshold value is G_MAXUINT64 that disables
the feature.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Closes: #1443
Approved by: cgwalters
2018-03-07 18:28:59 +00:00
Marcus Folkesson 6bf4b3e1d8 Add SPDX-License-Identifier to source files
SPDX License List is a list of (common) open source
licenses that can be referred to by a “short identifier”.
It has several advantages compared to the common "license header texts"
usually found in source files.

Some of the advantages:
* It is precise; there is no ambiguity due to variations in license header
  text
* It is language neutral
* It is easy to machine process
* It is concise
* It is simple and can be used without much cost in interpreted
  environments like java Script, etc.
* An SPDX license identifier is immutable.
* It provides simple guidance for developers who want to make sure the
  license for their code is respected

See http://spdx.org for further reading.

Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>

Closes: #1439
Approved by: cgwalters
2018-01-30 20:03:42 +00:00
Colin Walters 8e6e64a5ad lib: Validate metadata structure more consistently during pull
Previously we were doing e.g. `ot_util_filename_validate()` specifically inline
in dirtree objects, but only *after* writing them into the staging directory (by
default). In (non-default) cases such as not using a transaction, such an object
could be written directly into the repo.

A notable gap here is that `pull-local --untrusted` was *not* doing
this verification, just checksums.  We harden that (and also the
static delta writing path, really *everything* that calls
`ostree_repo_write_metadata()` to also do "structure" validation
which includes path traversal checks.  Basically, let's try hard
to avoid having badly structured objects even in the repo.

One thing that sucks in this patch is that we need to allocate a "bounce buffer"
for metadata in the static delta path, because GVariant imposes alignment
requirements, which I screwed up and didn't fulfill when designing deltas. It
actually didn't matter before because we weren't parsing them, but now we are.
In theory we could check alignment but ...eh, not worth it, at least not until
we change the delta compiler to emit aligned metadata which actually may be
quite tricky.  (Big picture I doubt this really matters much right now
but I'm not going to pull out a profiler yet for this)

The pull test was extended to check we didn't even write a dirtree
with path traversal into the staging directory.

There's a bit of code motion in extracting
`_ostree_validate_structureof_metadata()` from `fsck_metadata_object()`.

Then `_ostree_verify_metadata_object()` builds on that to do checksum
verification too.

Closes: #1412
Approved by: jlebon
2018-01-12 19:38:34 +00:00
Colin Walters 7935b881bf lib/repo: Add an API to mark a commit as partial
For the [rpm-ostree jigdo ♲📦](https://github.com/projectatomic/rpm-ostree/issues/1081) work.
We're basically doing "pull" via a non-libostree mechanism, and this
should be fully supported.  As I mentioned earlier we should try to
have `ostree-repo-pull.c` only use public APIs; this gets us closer
to that.

Closes: #1376
Approved by: jlebon
2017-12-14 15:51:07 +00:00
Colin Walters f81e3c6f03 lib/commit: Use more direct path for regfile commits
In the non-`CONSUME` path for regfiles (which happens currently for
`bare-user`), we go to a lot of contortions to make an "object stream",
only to immediately parse it again.

Fixing this will also enable the `G_IS_FILE_DESCRIPTOR_BASED()` fast path in
commit, since the input stream will actually reference the file descriptor and
not be an `_OstreeChainInputStream`.

There's a slight concern here in that we're no longer checksumming *literally*
the object stream passed in for the stream case, but I mention in the comment,
the data should be the same, and if it's not somehow we're not adding risk,
since the checksum is still covering the data we actually care about.

Prep for further changes to break up the `write_content_object()` path into
separate paths for archive, as well as regfile vs symlink in non-archive.

Closes: #1371
Approved by: jlebon
2017-12-12 14:17:20 +00:00
Colin Walters 6d8aaf629c lib/commit: Fix memleak in bare-user devino hit path
I noticed this while chasing an entirely different issue:
https://github.com/projectatomic/rpm-ostree/pull/1139

Closes: #1370
Approved by: jlebon
2017-12-12 14:03:18 +00:00
Colin Walters 9bb59511ae lib/commit: Refactor file commits to separate subdir from content
One major thing we can do to speed up local commits is multithreading. In
preparation for that, split up the recursion function so that the subdirectory
case is separate from the content (regfile/symlink) case. Then for non-subdirs,
we can easily peel off worker threads and gather the final checksums and update
the mtree from the main thread.

The diff here looks large but it's pretty straightforward; amazingly this change
compiled the very first time I tried it!

Closes: #1365
Approved by: jlebon
2017-12-07 19:49:23 +00:00
Dan Nicholson 6d978893f1 lib/commit: Add repository locking during transactions
Take a shared repo lock during a transaction to ensure that another
process doesn't delete objects.

Closes: #1343
Approved by: cgwalters
2017-12-05 02:32:47 +00:00
Colin Walters e48262c659 lib/repo: Add some error prefixing in commit, repo create
I was getting a bare `error: Creating temp file: No such file or directory` when
debugging `test-concurrency.py`; with this I get
`error: Writing content object: Creating temp file: No such file or directory`
which helps me pin it down.

Closes: #1343
Approved by: cgwalters
2017-12-05 02:32:47 +00:00
Colin Walters 89a57bb6d8 lib/repo: Add MT support for transaction_set_ref(), clarify MT rules
For rpm-ostree I'd like to do importing in parallel with threads; the code is
*almost* ready for that except today it calls
`ostree_repo_transaction_set_ref()`.

Looking at the code, there's really a "transaction" struct here,
not just stats.  Let's lift that struct out, and move the refs
into it under the existing lock.

Clarify the documentation around multithreading for various functions.

Closes: #1358
Approved by: jlebon
2017-12-04 19:16:21 +00:00
Colin Walters 7c8ea25306 lib/repo: Add a DEVINO_CANONICAL commit modifier flag
I was seeing the `Writing OSTree commit...` phase of rpm-ostree
being very slow lately.  This turns out to be more fallout from
https://github.com/ostreedev/ostree/pull/1170
AKA commit: 8fe4536

Loading the xattrs is slow on my system (F27AW, XFS+LVM, NVMe). I haven't fully
traced through why, but AIUI at least on XFS the xattrs are often stored outside
of the inode so it's a little bit like doing an `open()+read()`. Plus there's
the LSM overhead, etc.

The thing is that for rpm-ostree's package layering use case, we
basically always want to treat the on-disk state as canonical.  (There's
a subtle case here if one does overrides for something that contains
policy but we'll fix that).

Anyways, so we're in a state now where we do the slow but correct thing by
default, which seems sane. But let's allow the app to opt-in to telling us
"really trust devino". The difference between a `stat()` + hash table lookup
versus the full xattr load on my test case of `rpm-ostree install
./tree-1.7.0-10.fc27.x86_64.rpm` is absolutely dramatic; consistently on the
order of 10s without this support, and <1s with (800ms).

Closes: #1357
Approved by: jlebon
2017-12-04 14:42:37 +00:00
Colin Walters 4eae6529ed lib/commit: Move txn stagedir deletion/unlock into one place
Previously we'd delete the tmpdir in `rename_pending_loose_objects()`
but do the unlock inside `ostree_repo_commit_transaction()`.  Move
them into the same place in the latter function for consistency.

Doesn't fix anything, just a cleanup while reading the code and
working on `test-concurrency.py`.

Closes: #1352
Approved by: dbnicholson
2017-12-01 19:00:18 +00:00
Colin Walters 870b614f37 lib/commit: Minor refactoring of tmpdir cleanup code
Prep for future work here; let's cleanly separate the path for cleaning up the
txn staging directories from the code that cleans up "other stuff". Currently
only the former case uses the `GLnxLockFile` etc.

Closes: #1352
Approved by: dbnicholson
2017-12-01 19:00:18 +00:00
Colin Walters 72304a272c lib/commit: Reuse txn dir for tmpfiles
This closes a race condition I was seeing with `test-concurrency.py`. If we
don't have `O_TMPFILE` (or for symlinks) we'll create temporary files;
previously these would be subject to the date-based pruning because we set the
timestamp to 0 for objects.

Having our temporary files also in the txn staging dir ensures that they're
covered by the locking we do for that directory, and it's also generally cleaner
since the lifecycle of all the temporary data for a txn is in one place.

Closes: #1352
Approved by: dbnicholson
2017-12-01 19:00:18 +00:00
Colin Walters 17308e2149 lib/repo: Add a new private API for bare content writes
This lowers into the commit core what the static delta code
was doing, and improves the API.

The bigger picture issue is that for writing large files, our current "pull" API
where the caller provides a `GInputStream` is very awkward in some scenarios.
For example, we have a whole "libarchive input stream" that is a ~200 line
GObject that boils down to wrapping `archive_read_data()`.

This came more to a head when I was working on rpm-ostree jigdo since I had to
copy that object.

One step we can take after this is to further split `write_content_object()`
into a "write symlink or archive object" versus "write bare content object"
(it already has a mess of conditionals) and teach the latter case to call
this.

The eventual goal here is to make this API public.

Closes: #1355
Approved by: jlebon
2017-11-30 16:39:52 +00:00