I'm aiming to do some more work on the Rust side around `fsck`
like functionality, and this is a useful primitive. There isn't
a great Rust crate for xattrs, and I think it's better to share this
code.
Previously, the reference count was left uninitialized as a result of
bypassing the constructor, and the intended abort-on-error usually
wouldn't have happened.
Fixes: 8a9737a "repo/private: move OstreeRepoAutoTransaction to a boxed type"
Resolves: https://github.com/ostreedev/ostree/issues/2592
Signed-off-by: Simon McVittie <smcv@collabora.com>
This will allow the direct allocation in
ostree_repo_prepare_transaction() to be replaced with a call to this
function, avoiding breaking encapsulation.
Signed-off-by: Simon McVittie <smcv@collabora.com>
Quite a while ago we added staged deployments, which solved
a bunch of issues around the `/etc` merge. However...a persistent
problem since then is that any failures in that process that
happened in the *previous* boot are not very visible.
We ship custom code in `rpm-ostree status` to query the previous
journal. But that has a few problems - one is that on systems
that have been up a while, that failure message may even get
rotated out. And second, some systems may not even have a persistent
journal at all.
A general thing we do in e.g. Fedora CoreOS testing is to check
for systemd unit failures. We do that both in our automated tests,
and we even ship code that displays them on ssh logins. And beyond
that obviously a lot of other projects do the same; it's easy via
`systemctl --failed`.
So to make failures more visible, change our `ostree-finalize-staged.service`
to have an internal wrapper around the process that "catches" any
errors, and copies the error message into a file in `/boot/ostree`.
Then, a new `ostree-boot-complete.service` looks for this file on
startup and re-emits the error message, and fails.
It also deletes the file. The rationale is to avoid *continually*
warning. For example we need to handle the case when an upgrade
process creates a new staged deployment. Now, we could change the
ostree core code to delete the warning file when that happens instead,
but this is trying to be a conservative change.
This should make failures here much more visible as is.
The `archive_entry_symlink()` API can definitely return `NULL`,
reading through the libarchive sources.
I hit this in the wild when using old ostree-ext to try to unpack
a chunked archive.
I didn't try to characterize this more, and sorry no unit test right
now.
Whenever the user has SELinux enabled and has any local
modules/modifications installed, it is necessary to rebuild the policy
in the final deployment, otherwise ostree will leave the binary policy
files unchanged from last deployment as it detects difference against
the base content (in rpm-ostree case this is the RPM content).
To avoid the situation where the policy binaries go stale once any local
customization of the policy is made, try to rebuild the policy as part
of sysroot_finalize_deployment(). Use the special
--rebuild-if-modules-changed switch, which detects if the input module
files have changed relative to last time the policy was built and skips
the most time-consuming part of the rebuild process if modules are
unchanged (thus making this a relatively cheap operation if the user
hasn't made any modifications to the shipped policy).
As suggested by Jonathan Lebon, this uses bubblewrap (via
g_spawn_sync()) to perform the rebuild inside the deployment's
filesystem tree, which also means that ostree will have a runtime
dependency on bubblewrap.
Partially addresses: https://github.com/coreos/fedora-coreos-tracker/issues/701
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Like every other error return path in this function, jump to the `out`
label on error here. Returning directly will cause leaks.
Spotted by reading the code, not actually necessarily encountered in the
wild.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
On OSs that do not consistently merge /usr/bin with /bin, the path to
bash has traditionally been /bin/bash.
Signed-off-by: Simon McVittie <smcv@debian.org>
An indented `#!` is technically meaningless, although many shells will
run text files with the shell if asked to execute them.
Signed-off-by: Simon McVittie <smcv@debian.org>
This prevents writing content into 'bare-split-xattrs` repository,
while carving some space for experimenting via a temporary
`OSTREE_EXP_WRITE_BARE_SPLIT_XATTRS` environment flag.
This adds two new object types for storing xattrs separately from
content objects.
`.file-xattrs` are regular files storing xattrs content, encoded as
GVariant. Each object is keyed by the checksum of its content, allowing
for multiple references.
`.file-xattrs-link` are hardlinks which are associated to file objects.
Each object is keyed by the same checksum of the corresponding file
object. The target of the hardlink is an existing file-xattrs object.
In case of reaching the limit of too many links, this object could be
a plain file too.
Recently we have noticed exceedingly long execution times
for multiple invocations of ostree prune. This is a result of
calculating full reachability on each invocation.
The --commit-only flag provides an alternative strategy. It will only
traverse and delete commit objects to avoid the more expensive
reachability calculations. This allows us to chain multiple --commit-only
commands cheaply, and then follow with a more expensive ostree prune
invocation at the end to clean up orphaned meta and content objects.
This patch makes it so that we mark the .commit file from a static delta
as partial before writing the commit to the staging directory. This
exactly mirrors what we do in meta_fetch_on_complete() when writing the
commit on that codepath, which should lend some credibility to the
correctness of this patch.
I have checked that this fixes an issue Flatpak users have been
encountering (https://github.com/flatpak/flatpak/issues/3479) which
results in error messages like "error: Failed to install
org.freedesktop.Sdk.Extension.texlive: Failed to read commit
c7958d966cfa8b80a42877d1d6124831d7807f93c89461a2a586956aa28d438a: No
such metadata object
8bdaa943b957f3cf14d19301c59c7eec076e57389e0fbb3ef5d30082e47a178f.dirtree"
Here's the sequence of events that lead to the error:
1. An install operation is started that fetches static deltas.
2. The fetch is interrupted for some reason such as network connectivity
dropping.
3. The .commit and .commitmeta files for the commit being pulled are
left in the staging dir, e.g.
"~/.local/share/flatpak/repo/tmp/staging-dfe862b2-13fc-49a2-ac92-5a59cc0d8e18-RURckd"
4. There is no `.commitpartial` file for the commit in
"~/.local/share/flatpak/repo/state/"
5. The next time the user attempts the install, libostree reuses the
existing staging dir, pulls the commit and commitmeta objects into
the repo from the staging dir on the assumption that it's a complete
commit.
6. Flatpak then tries to deploy the commit but fails in
ostree_repo_read_commit() in flatpak_dir_deploy(), leading to the
error message "Failed to read commit ..."
7. This happens again any subsequent time the user attempts the install,
until the incomplete commit is removed with "flatpak repair --user".
I will try to also add a workaround in Flatpak so this is fixed even
when Flatpak links against affected versions of libostree.
I'm working on enhancing the ostree-rs-ext test suite and I hit
a bug where walking a mtree and creating a parent would fail to
load lazy intermediate directories, e.g.:
/ -> usr -> bin
If we walked we'd load `/` but keep `usr` lazy, and then invalidation
would crash because it wasn't loaded.
If we're going to mutate a subdir, we need to have all the parents
loaded.
I know this is missing tests, but...it's a bit tedious to do with
the existing C tests. Eventually soon we'll execute on merging
all 3 repos, and better share test suites.
This allows to use Secure Execution with LUKS encrypted boot disk,
key and cryptab are stored only in 'sd-boot' encrypted image.
Signed-off-by: Nikita Dubrovskii <nikita@linux.ibm.com>
If system contains ibm-z-hostkey (fetched during ignition), than
ostree generates 'sd-boot' image and reboots into Secure Execution
Signed-off-by: Nikita Dubrovskii <nikita@linux.ibm.com>
I think I'm hitting issues due to this while using the Rust bindings:
https://github.com/coreos/rpm-ostree/pull/3406#issuecomment-1033084956
The bindings for those APIs use `from_glib_full` which says:
> Because ownership can only be transferred if something is already
> referenced, this is unsuitable for floating references.
The commit metadata `version` key is well established but getting it for
a remote commit is cumbersome since the commit object needs to be
fetched and loaded. Including it in the summary additional metadata
allows a much more convenient view of what each of the remote refs
represents.
This aligns all the assertion in the module. In particular, it gets
rid of all `g_return_val_if_fail` instances which may fail without
properly setting GError to the caller.
When we clean up from an error, for example copy_file_range() failing
while we generate a static delta (perhaps caused by
https://gitlab.gnome.org/GNOME/libglnx/-/issues/3 or by a
genuine write error), we might free a variant builder that has a
non-null parent. Previously, this caused infinite recursion and a stack
overflow, repeatedly freeing the same object, but Luca Bruno suggested
that the intention here appears to have been to free the parent object.
Partially resolves https://github.com/ostreedev/ostree/issues/2525
(the other bug reported in that issue needs to be resolved by updating
libglnx to a version where libglnx#3 has been fixed).
Signed-off-by: Simon McVittie <smcv@collabora.com>
This tightens up the logic for opening a file while inspecting its
xattrs. The only codepath fetching xattrs from a FD is the one
handling 'bare' mode.
It also rearranges the else-assert flow, mostly for future-proofing.
Support for that file was added previously, but the testing lived in
rpm-ostree only. Let's add it here too.
In the process add a hidden `--lock-finalization` to `ostree admin
deploy` to make testing easier (though it could also be useful to update
managers driving OSTree via the CLI).
Otherwise, any future staged deployment will also automatically be
locked even if not requested. Likely we should fold the locking into the
primary `staged-deployment` serialized GVariant instead.
This reworks `ostree ls` top-level logic so that cancellation
tokens and error details are plumbed through all codepaths.
It also gets rid of all previous goto jumps.
We do implicitly have this data because we log timings via structured
metadata in a later journal entry, but it's quite common to lose
the structured metadata because a lot of tooling just grabs the default
syslog-compatible text from `journalctl`.
Let's be louder when we hit this case as a general rule too; I think
most people shipping ostree systems want to see if it's happening.
https://bugzilla.redhat.com/show_bug.cgi?id=2003532
Basically there's a systemd bug where it's losing the `_netdev`
aspect of Ceph filesystem mounts. This means the network is taken
down before Ceph is unmounted. In turn, our invocation of `sync()`
blocks on Ceph, which won't succeed.
And this in turn manifests as a failure to transition to the new
deployment.
I initially did this patch to just rip out the global `sync()`. I
am pretty sure we don't need it anymore. We've been doing individual
`syncfs()` on `/sysroot` and `/boot` for a while now, and those
are the only filesystems we should be touching. But *proving* that
is a whole other thing of course.
To be conservative, let's instead just add a timeout of 5s on
our invocation of `sync()`. It doesn't return any information on
success/error anyways.
To allow testing without the `sync()` invocation, we also support
a new `OSTREE_SYSROOT_OPT_SKIP_SYNC=1` environment variable. For
staged deployments, this needs to be injected via e.g. systemd unit
overrides into `ostree-finalize-staged.service`.
Implementing this is a bit hairy - we need to spawn a thread. I
debated blocking in arecursive mainloop, but I think `g_cond_wait_until()`
is also fine here.