I'm aiming to do some more work on the Rust side around `fsck`
like functionality, and this is a useful primitive. There isn't
a great Rust crate for xattrs, and I think it's better to share this
code.
Previously, the reference count was left uninitialized as a result of
bypassing the constructor, and the intended abort-on-error usually
wouldn't have happened.
Fixes: 8a9737a "repo/private: move OstreeRepoAutoTransaction to a boxed type"
Resolves: https://github.com/ostreedev/ostree/issues/2592
Signed-off-by: Simon McVittie <smcv@collabora.com>
This will allow the direct allocation in
ostree_repo_prepare_transaction() to be replaced with a call to this
function, avoiding breaking encapsulation.
Signed-off-by: Simon McVittie <smcv@collabora.com>
Quite a while ago we added staged deployments, which solved
a bunch of issues around the `/etc` merge. However...a persistent
problem since then is that any failures in that process that
happened in the *previous* boot are not very visible.
We ship custom code in `rpm-ostree status` to query the previous
journal. But that has a few problems - one is that on systems
that have been up a while, that failure message may even get
rotated out. And second, some systems may not even have a persistent
journal at all.
A general thing we do in e.g. Fedora CoreOS testing is to check
for systemd unit failures. We do that both in our automated tests,
and we even ship code that displays them on ssh logins. And beyond
that obviously a lot of other projects do the same; it's easy via
`systemctl --failed`.
So to make failures more visible, change our `ostree-finalize-staged.service`
to have an internal wrapper around the process that "catches" any
errors, and copies the error message into a file in `/boot/ostree`.
Then, a new `ostree-boot-complete.service` looks for this file on
startup and re-emits the error message, and fails.
It also deletes the file. The rationale is to avoid *continually*
warning. For example we need to handle the case when an upgrade
process creates a new staged deployment. Now, we could change the
ostree core code to delete the warning file when that happens instead,
but this is trying to be a conservative change.
This should make failures here much more visible as is.
The `archive_entry_symlink()` API can definitely return `NULL`,
reading through the libarchive sources.
I hit this in the wild when using old ostree-ext to try to unpack
a chunked archive.
I didn't try to characterize this more, and sorry no unit test right
now.
Whenever the user has SELinux enabled and has any local
modules/modifications installed, it is necessary to rebuild the policy
in the final deployment, otherwise ostree will leave the binary policy
files unchanged from last deployment as it detects difference against
the base content (in rpm-ostree case this is the RPM content).
To avoid the situation where the policy binaries go stale once any local
customization of the policy is made, try to rebuild the policy as part
of sysroot_finalize_deployment(). Use the special
--rebuild-if-modules-changed switch, which detects if the input module
files have changed relative to last time the policy was built and skips
the most time-consuming part of the rebuild process if modules are
unchanged (thus making this a relatively cheap operation if the user
hasn't made any modifications to the shipped policy).
As suggested by Jonathan Lebon, this uses bubblewrap (via
g_spawn_sync()) to perform the rebuild inside the deployment's
filesystem tree, which also means that ostree will have a runtime
dependency on bubblewrap.
Partially addresses: https://github.com/coreos/fedora-coreos-tracker/issues/701
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Like every other error return path in this function, jump to the `out`
label on error here. Returning directly will cause leaks.
Spotted by reading the code, not actually necessarily encountered in the
wild.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
On OSs that do not consistently merge /usr/bin with /bin, the path to
bash has traditionally been /bin/bash.
Signed-off-by: Simon McVittie <smcv@debian.org>
An indented `#!` is technically meaningless, although many shells will
run text files with the shell if asked to execute them.
Signed-off-by: Simon McVittie <smcv@debian.org>
This prevents writing content into 'bare-split-xattrs` repository,
while carving some space for experimenting via a temporary
`OSTREE_EXP_WRITE_BARE_SPLIT_XATTRS` environment flag.
This adds two new object types for storing xattrs separately from
content objects.
`.file-xattrs` are regular files storing xattrs content, encoded as
GVariant. Each object is keyed by the checksum of its content, allowing
for multiple references.
`.file-xattrs-link` are hardlinks which are associated to file objects.
Each object is keyed by the same checksum of the corresponding file
object. The target of the hardlink is an existing file-xattrs object.
In case of reaching the limit of too many links, this object could be
a plain file too.
Recently we have noticed exceedingly long execution times
for multiple invocations of ostree prune. This is a result of
calculating full reachability on each invocation.
The --commit-only flag provides an alternative strategy. It will only
traverse and delete commit objects to avoid the more expensive
reachability calculations. This allows us to chain multiple --commit-only
commands cheaply, and then follow with a more expensive ostree prune
invocation at the end to clean up orphaned meta and content objects.
This patch makes it so that we mark the .commit file from a static delta
as partial before writing the commit to the staging directory. This
exactly mirrors what we do in meta_fetch_on_complete() when writing the
commit on that codepath, which should lend some credibility to the
correctness of this patch.
I have checked that this fixes an issue Flatpak users have been
encountering (https://github.com/flatpak/flatpak/issues/3479) which
results in error messages like "error: Failed to install
org.freedesktop.Sdk.Extension.texlive: Failed to read commit
c7958d966cfa8b80a42877d1d6124831d7807f93c89461a2a586956aa28d438a: No
such metadata object
8bdaa943b957f3cf14d19301c59c7eec076e57389e0fbb3ef5d30082e47a178f.dirtree"
Here's the sequence of events that lead to the error:
1. An install operation is started that fetches static deltas.
2. The fetch is interrupted for some reason such as network connectivity
dropping.
3. The .commit and .commitmeta files for the commit being pulled are
left in the staging dir, e.g.
"~/.local/share/flatpak/repo/tmp/staging-dfe862b2-13fc-49a2-ac92-5a59cc0d8e18-RURckd"
4. There is no `.commitpartial` file for the commit in
"~/.local/share/flatpak/repo/state/"
5. The next time the user attempts the install, libostree reuses the
existing staging dir, pulls the commit and commitmeta objects into
the repo from the staging dir on the assumption that it's a complete
commit.
6. Flatpak then tries to deploy the commit but fails in
ostree_repo_read_commit() in flatpak_dir_deploy(), leading to the
error message "Failed to read commit ..."
7. This happens again any subsequent time the user attempts the install,
until the incomplete commit is removed with "flatpak repair --user".
I will try to also add a workaround in Flatpak so this is fixed even
when Flatpak links against affected versions of libostree.
I'm working on enhancing the ostree-rs-ext test suite and I hit
a bug where walking a mtree and creating a parent would fail to
load lazy intermediate directories, e.g.:
/ -> usr -> bin
If we walked we'd load `/` but keep `usr` lazy, and then invalidation
would crash because it wasn't loaded.
If we're going to mutate a subdir, we need to have all the parents
loaded.
I know this is missing tests, but...it's a bit tedious to do with
the existing C tests. Eventually soon we'll execute on merging
all 3 repos, and better share test suites.
This allows to use Secure Execution with LUKS encrypted boot disk,
key and cryptab are stored only in 'sd-boot' encrypted image.
Signed-off-by: Nikita Dubrovskii <nikita@linux.ibm.com>
If system contains ibm-z-hostkey (fetched during ignition), than
ostree generates 'sd-boot' image and reboots into Secure Execution
Signed-off-by: Nikita Dubrovskii <nikita@linux.ibm.com>
I think I'm hitting issues due to this while using the Rust bindings:
https://github.com/coreos/rpm-ostree/pull/3406#issuecomment-1033084956
The bindings for those APIs use `from_glib_full` which says:
> Because ownership can only be transferred if something is already
> referenced, this is unsuitable for floating references.
The commit metadata `version` key is well established but getting it for
a remote commit is cumbersome since the commit object needs to be
fetched and loaded. Including it in the summary additional metadata
allows a much more convenient view of what each of the remote refs
represents.
This aligns all the assertion in the module. In particular, it gets
rid of all `g_return_val_if_fail` instances which may fail without
properly setting GError to the caller.
This tightens up the logic for opening a file while inspecting its
xattrs. The only codepath fetching xattrs from a FD is the one
handling 'bare' mode.
It also rearranges the else-assert flow, mostly for future-proofing.
Otherwise, any future staged deployment will also automatically be
locked even if not requested. Likely we should fold the locking into the
primary `staged-deployment` serialized GVariant instead.
We do implicitly have this data because we log timings via structured
metadata in a later journal entry, but it's quite common to lose
the structured metadata because a lot of tooling just grabs the default
syslog-compatible text from `journalctl`.
Let's be louder when we hit this case as a general rule too; I think
most people shipping ostree systems want to see if it's happening.
https://bugzilla.redhat.com/show_bug.cgi?id=2003532
Basically there's a systemd bug where it's losing the `_netdev`
aspect of Ceph filesystem mounts. This means the network is taken
down before Ceph is unmounted. In turn, our invocation of `sync()`
blocks on Ceph, which won't succeed.
And this in turn manifests as a failure to transition to the new
deployment.
I initially did this patch to just rip out the global `sync()`. I
am pretty sure we don't need it anymore. We've been doing individual
`syncfs()` on `/sysroot` and `/boot` for a while now, and those
are the only filesystems we should be touching. But *proving* that
is a whole other thing of course.
To be conservative, let's instead just add a timeout of 5s on
our invocation of `sync()`. It doesn't return any information on
success/error anyways.
To allow testing without the `sync()` invocation, we also support
a new `OSTREE_SYSROOT_OPT_SKIP_SYNC=1` environment variable. For
staged deployments, this needs to be injected via e.g. systemd unit
overrides into `ostree-finalize-staged.service`.
Implementing this is a bit hairy - we need to spawn a thread. I
debated blocking in arecursive mainloop, but I think `g_cond_wait_until()`
is also fine here.
In fixing https://github.com/coreos/rpm-ostree/pull/3323
I felt that it was a bit ugly we're installing `/usr/bin/ostree-container`.
It's kind of an implementation detail. We want users to use
`ostree container`.
Let's support values outside of $PATH too.
For example, this also ensures that TAB completion for `ost` expands
to `ostree ` with a space.
This removes a 'g_setenv()' call, which could potentially be unsafe
in a multi-thread context.
The current libselinux codebase does not seem to check for
`LIBSELINUX_DISABLE_PCRE_PRECOMPILED`, so I think this has no effects
nowadays.
Additionally, I could not find any reference to it in libselinux
git history, so I'm not sure if it ever played any role at all.
My current understanding is that this is coming from version
incompatibilities between an older libselinux in the build environment
and a newer policy (with precompiled regexs) in the target.
But from the ML discussion I found, I think it eventually got
solved in a different way, possibly by avoiding the policy binary
caches.
Refs:
* https://www.spinics.net/lists/selinux/msg14822.html
* https://github.com/ostreedev/ostree/pull/2513#discussion_r781042884
Fixes `Argument with 'nonnull' attribute passed null` by making
the code not exist at all anymore.
In upstream libsoup this code is gone too; it uses `GUri` from glib
which we probably could now too, but one thing at a time.
This is trying to address:
https://pagure.io/fedora-iot/issue/48
Basically we changed rpm-ostree to start doing a shared lock during
commit by default, but this broke because pungi is starting a process
doing a commit for each architecture, and then trying to regenerate
the summary after each one.
This patch is deleting a big comment with a rationale for why
summary regeneration should be exclusive. Point by point:
> This makes sure the commits and deltas don't get
> deleted while generating the summary.
But prune operations require an exclusive lock, which means that
data still can't be deleted when the summary grabs a shared lock.
> It also means we can be sure refs
> won't be created/updated/deleted during the operation, without having to
> add exclusive locks to those operations which would prevent concurrent
> commits from working.
First: The status quo *has* prevented concurrent commits from working!
There is no real locking solution to this problem. What we really
need to do here is regenerate the summary after each commit *or*
when the caller decides to do it and e.g. include deltas at the same
time.
It's OK if multiple threads race to regenerate the summary;
last-one-wins behavior here is totally fine.
In general, we're probably going to need to change most of our
`g_return_if_fail` to `g_assert`. The analyzer flags that
the function can return `NULL`, but the caller isn't prepared for
this.
In practice, let's abort.
In general, we're probably going to need to change most of our
`g_return_if_fail` to `g_assert`. The analyzer flags that
the function can return `NULL`, but the caller isn't prepared for
this.
In practice, let's abort.
In general, we're probably going to need to change most of our
`g_return_if_fail` to `g_assert`. The analyzer flags that
the function can return `NULL`, but the caller isn't prepared for
this.
In practice, let's abort.
In general, we're probably going to need to change most of our
`g_return_if_fail` to `g_assert`. The analyzer flags that
the function can return `NULL`, but the caller isn't prepared for
this.
In practice, let's abort.
This defines `OstreeRepoAutoTransaction` as a boxed type, in order
to support auto-generating bindings for it.
That first requires adding internal reference-counting to it, to
allow freely copying/freeing references to a single transaction guard.
This enhances the auto-transaction logic, augmenting the scope of a
transaction guard.
It allows committing or aborting a transaction through its guard.
It also supports tracking the completion status of a transaction
guard, avoiding double commits/aborts, while retaining the auto-cleanup
logic.
https://bugzilla.redhat.com/show_bug.cgi?id=1945274 is an issue where a privileged
kubernetes daemonset is writing a socket into `/etc`. This makes ostree upgrades barf.
Now, they should clearly move it to `/run`. However, one option is for us to
just ignore it instead of erroring out. Some brief investigation shows that
e.g. `git add somesocket` is a silent no-op, which is an argument in favor of ignoring it.
Closes: https://github.com/ostreedev/ostree/issues/2446
This is nicer than having the caller parse the commit
object, or indirect via the `OstreeRepoFile*` object of the root.
Will be used in ostree-rs-ext around tar parsing.
This is part of `OstreeCommitModifier`, but I'm not using
that in some of the ostree-ext Rust code.
It just makes more sense as a direct policy API, where it should
have been in the first place. There's already support for
setting a policy object on a commit modifier, so that's all the
old API needs to do now.
This will be helpful for the "ostree native container" work in
https://github.com/ostreedev/ostree-rs-ext/
Basically in order to reuse GPG/signapi verification, we need
to support adding a remote, even though it can't be used via
`ostree pull`. (At least, not until we merge ostree-rs-ext into ostree, but
even then I think the principle stands)
for deltafiles the legacy_transaction_resuming flag is not used,
which will mark the commit as done, even if files are missing.
using already existing commitstate_is_partial function as fix
We're waaay overdue for this, it's been the default
in rpm-ostree for years, and solves several important bugs
around not capturing `/etc` while things are running.
Also, `ostree admin upgrade --stage` (should) become idempotent.
Closes: https://github.com/ostreedev/ostree/issues/2389
We have a bunch of APIs to do GPG verification of a commit,
but that doesn't generalize to signapi. Further, they
require the caller to check the signature status explicitly
which seems like a trap.
This much higher level API works with both GPG and signapi.
The intention is to use this in things that are doing "external
pulls" like the ostree-ext tar import support. There we will
get the commitmeta from the tarball and we want to verify it
at the same time we import the commit.
Followup to PRs related to https://github.com/ostreedev/ostree/issues/2410
Since the test suite now covers this the test was failing on
a Fedora SELinux enabled host where we see `security.selinux`
even if not in the commit.
I was seeing an `EPERM` here which was confusing.
It turned out the real error was `EEXIST`.
Since we're referring to the original error, but we do a
lot of computation in the middle, we need to save errno.
This fixes some aspects of OstreeRepoAutoTransaction and re-aligns
it with the logic in flatpak. Specifically:
* link to the underlying repo through refcounting
* bridge internal errors to warning messages
* verify the input pointer type
This is a preparation step before exposing this logic as a public API.