Quite a while ago we added staged deployments, which solved
a bunch of issues around the `/etc` merge. However...a persistent
problem since then is that any failures in that process that
happened in the *previous* boot are not very visible.
We ship custom code in `rpm-ostree status` to query the previous
journal. But that has a few problems - one is that on systems
that have been up a while, that failure message may even get
rotated out. And second, some systems may not even have a persistent
journal at all.
A general thing we do in e.g. Fedora CoreOS testing is to check
for systemd unit failures. We do that both in our automated tests,
and we even ship code that displays them on ssh logins. And beyond
that obviously a lot of other projects do the same; it's easy via
`systemctl --failed`.
So to make failures more visible, change our `ostree-finalize-staged.service`
to have an internal wrapper around the process that "catches" any
errors, and copies the error message into a file in `/boot/ostree`.
Then, a new `ostree-boot-complete.service` looks for this file on
startup and re-emits the error message, and fails.
It also deletes the file. The rationale is to avoid *continually*
warning. For example we need to handle the case when an upgrade
process creates a new staged deployment. Now, we could change the
ostree core code to delete the warning file when that happens instead,
but this is trying to be a conservative change.
This should make failures here much more visible as is.
https://bugzilla.redhat.com/show_bug.cgi?id=2003532
Basically there's a systemd bug where it's losing the `_netdev`
aspect of Ceph filesystem mounts. This means the network is taken
down before Ceph is unmounted. In turn, our invocation of `sync()`
blocks on Ceph, which won't succeed.
And this in turn manifests as a failure to transition to the new
deployment.
I initially did this patch to just rip out the global `sync()`. I
am pretty sure we don't need it anymore. We've been doing individual
`syncfs()` on `/sysroot` and `/boot` for a while now, and those
are the only filesystems we should be touching. But *proving* that
is a whole other thing of course.
To be conservative, let's instead just add a timeout of 5s on
our invocation of `sync()`. It doesn't return any information on
success/error anyways.
To allow testing without the `sync()` invocation, we also support
a new `OSTREE_SYSROOT_OPT_SKIP_SYNC=1` environment variable. For
staged deployments, this needs to be injected via e.g. systemd unit
overrides into `ostree-finalize-staged.service`.
Implementing this is a bit hairy - we need to spawn a thread. I
debated blocking in arecursive mainloop, but I think `g_cond_wait_until()`
is also fine here.
In general, we're probably going to need to change most of our
`g_return_if_fail` to `g_assert`. The analyzer flags that
the function can return `NULL`, but the caller isn't prepared for
this.
In practice, let's abort.
We struggled for a long time with enablement of our "internal units",
trying to follow the philosophy that units should only be enabled
by explicit preset.
See https://bugzilla.redhat.com/show_bug.cgi?id=1451458
and https://github.com/coreos/rpm-ostree/pull/1482
etc.
And I just saw chat (RH internal on a proprietary system sadly) where
someone hit `ostree-remount.service` not being enabled in CentOS8.
Thinking about this more, I realized we've shipped a systemd generator
for a long time and while its only role until now was to generate `var.mount`,
but by using it to force on our internal units, we don't require
people to deal with presets anymore.
Basically we're inverting things so that "if ostree= is on the kernel
cmdline, then enable our units" and not "enable our units, but have
them use ConditionKernelCmdline=ostree to skip".
Drop the weird gyrations we were doing around `ostree-finalize-staged.path`
too; forking `systemctl start` is just asking for bugs.
So after this, hopefully we won't ever again have to think about
distribution presets and our units.
The recent change in https://github.com/coreos/fedora-coreos-config/pull/659
broke some of our tests that do `mount -o remount,rw /sysroot` but
leave `/boot` read-only.
We had code for having `/boot` read-only before `/sysroot` but
in practice we had a file descriptor for `/sysroot` that we opened
before the remount that would happen later on.
Clean things up here so that in the library, we also remount
`/boot` at the same time we remount `/sysroot` if either are readonly.
Delete the legacy code for remounting `/boot` rw if we're not in
a mount namespace. I am fairly confident most users are either
using the `ostree` CLI, or they're using the mount namespace.
Just like we hold a fd for `/sysroot`, also do so for `/boot`
instead of opening and closing it in a few places.
This is a preparatory cleanup for further work.
...with the `sysroot.bootloader` configuration option. This can be useful
when converting a system to use `ostree` which doesn't currently have a
bootloader configuration that `ostree` can automatically detect, and is
also useful in combination with the `--sysroot` option when provisioning a
rootfs for systems other than the one you're running `ostree admin deploy`
on.
It's easier to extend and it centralises the config parsing. In other
places we will no longer need to use `g_str_equal` to match these values,
a `switch` statement will be sufficient.
In FCOS and RHCOS, the need to configure software in the initramfs has
come up multiple times. Sometimes, using kernel arguments suffices.
Other times, it really must be a configuration file. Rebuilding the
initramfs on the client-side however is a costly operation. Not only
does it add complexity to the update workflow, it also erodes a lot of
the value obtained from using the baked "blessed" initramfs from the
tree itself.
One elegant way to address this is to allow specifying multiple
initramfses. This is supported by most bootloaders (notably GRUB) and
results in each initrd being overlayed on top of each other.
This patch allows libostree clients to leverage this so that they can
avoid regenerating the initramfs entirely. libostree itself is agnostic
as to what kind and how much data overlay initrds contain. It's up to
the clients to enforce such boundaries.
To implement this, we add a new ostree_sysroot_stage_overlay_initrd
which takes a file descriptor and returns a checksum. Then users can
pass these checksums when calling the deploy APIs via the new array
option `overlay_initrds`. We copy these files into `/boot` and add them
to the BLS as another `initrd` entry.
When support for devicetree was added, it created a problem
because old and new ostree versions would compute different
checksums for the "boot data". The scenario here is:
- Have system with ostree < 2020.4
- Reboot into system with ostree 2020.5
- Try to perform an operation that would retain
that previous booted deployment (common)
Currently ostree iterates over all the deployments
that will be retained and calls `install_deployment_kernel()`,
even for the booted one (which is a bit silly), but
just to verify that all boot data for the targeted
deployments are installed.
This then re-computes the checksum and we'd trip this
assertion.
In practice though, we don't strictly require them to match;
the only thing that will happen if they don't is that we'll
end up with another copy of the kernel/initramfs - and
that only temporarily until the previous deployment
gets GC'd.
Longer term, I think what we really want to do anyways
is probably closer to like a little ostree repo for `/boot`
so that we can e.g. still hardlink kernels there even if
the initramfs changes, or hardlink both kernel/initramfs
if just the devicetree changes, etc.
Closes: https://github.com/ostreedev/ostree/issues/2154
I was thinking a bit more recently about the "live" changes
stuff https://github.com/coreos/rpm-ostree/issues/639
(particularly since https://github.com/coreos/rpm-ostree/pull/2060 )
and I realized reading the last debates in that issue that
there's really a much simpler solution; do exactly the same
thing we do for `ostree admin unlock`, except mount it read-only
by default.
Then, anything that wants to modify it does the same thing
libostree does for `/sysroot` and `/boot` as of recently; create
a new mount namespace and do the modifications there.
The advantages of this are numerous. First, we already have
all of the code, it's basically just plumbing through a new
entry in the state enumeration and passing `MS_RDONLY` into
the `mount()` system call.
"live" changes here also naturally don't persist, unlike what
we are currently doing in rpm-ostree.
Noticed this while writing tests for a core `ostree_sysroot_load()`
entrypoint. And decided to do the same for `ostree_repo_open()`,
and while there also noted we had a duplicate error prefixing
for the open (more recently `glnx_opendirat()` automatically
prefixes with the path).
These had been added assuming 2019.7 would be the next version, but now
it's 2020 and there's been a release. In the case of
`OstreeCommitSizesEntry`, I'd forgotten to move it forward from 2019.5
to 2019.7 in the time between when I started working on the feature and
it landed.
We want to support extending the read-only state to cover `/sysroot`
and `/boot`, since conceptually all of the data there should only
be written via libostree. Or at least for `/boot` should *mostly*
just be written by ostree.
This change needs to be opt-in though to avoid breaking anyone.
Add a `sysroot/readonly` key to the repository config which instructs
`ostree-remount.service` to ensure `/sysroot` is read-only. This
requires a bit of a dance because `/sysroot` is actually the same
filesystem as `/`; so we make `/etc` a writable bind mount in this case.
We also need to handle `/var` in the "OSTree default" case of a bind
mount; the systemd generator now looks at the writability state of
`/sysroot` and uses that to determine whether it should have the
`var.mount` unit happen before or after `ostree-remount.service.`
Also add an API to instruct the libostree shared library
that the caller has created a new mount namespace. This way
we can freely remount read-write.
This approach extends upon in a much better way previous work
we did to support remounting `/boot` read-write.
Closes: https://github.com/ostreedev/ostree/issues/1265
More "scan-build doesn't understand GError and our out-param conventions"
AKA "these errors would be impossible with Rust's sum type Result<> approach".
I've seen people confused by this error in the case where
`/boot` isn't mounted or the BLS fragments were deleted, etc.
If you understand ostree deeply it's clear but, let's do
better here and a direct error message for the case where
we can't find `/boot/loader` which is the majority of these.
The other case could happen if e.g. just the BLS fragment
for the booted deployment was deleted; let's reword that
one a bit too.
Closes: #1905
Approved by: rfairley
This change makes public the current kargs API in src/libostree/ostree-kernel-args.c
and adds documentations.
Upstreams the new kargs API from rpm-ostree/src/libpriv/rpmostree-kargs-process.c
Merges libostree_kernel_args_la_SOURCES to libostree_1_la_SOURCES in Makefile-libostree.am
Upstreams tests/check/test-kargs.c from rpm-ostree.
Closes: #1833Closes: #1869
Approved by: jlebon
Otherwise, we'll be subject to whatever `umask` is currently. Normally,
processes should respect `umask` when creating files and directories,
but specifically for `ostree admin unlock` (or `rpm-ostree usroverlay`),
this poses a problem since e.g. a `/usr` with mode 0700 will break any
daemon that doesn't run as root and needs to read files under `/usr`,
such as polkitd.
This patch just does a `chmod()` after the `mkdir()`. An alternative
would be to do `umask(0000)` after forking into the child process
that'll call `mount()`, but that'd require also moving the `mkdir()`
calls into there, making for a more intrusive patch.
Closes: #1843
Approved by: cgwalters
Rather than manually starting the `ostree-finalize-staged.service` unit,
we can leverage systemd's path units for this. It fits quite nicely too,
given that we already have a path we drop iif we have a staged
deployment.
To give some time for the preset to make it to systems, we don't yet
drop the explicit call to `systemctl start`. Though we do make it
conditional based on a DEBUG env var so that we can actually test it in
CI for now. Once we're sure this has propagated, we can drop the
`systemctl start` path and the env var together.
Closes: #1740
Approved by: cgwalters
This was pointed out in a previous PR review; we don't have
a need for the separate variables. Prep for adding an API for
this.
Closes: #1568
Approved by: jlebon
Followup to: https://github.com/ostreedev/ostree/pull/1503
After starting some more work on on this in rpm-ostree, it is
actually simpler if the staged deployment just shows up in the list.
It's effectively opt-in today; down the line we may make it the default,
but I worry about breaking things that e.g. assume they can mutate
the deployment before rebooting and have `/etc` already merged.
There's not that many things in libostree that iterate over the deployment
list. The biggest change here is around the
`ostree_sysroot_write_deployments_with_options` API. I initially
tried hard to support a use case like "push a rollback" while retaining
the staged deployment, but everything gets very messy because that
function truly is operating on the bootloader list.
For now what I settled on is to just discard the staged deployment;
down the line we can enhance things.
Where we then have some new gymnastics is around implementing
the finalization; we need to go to some effort to pull the staged
deployment out of the list and mark it as unstaged, and then pass
it down to `write_deployments()`.
Closes: #1539
Approved by: jlebon
In prep for staging work, where we'll need to load the origin
for the staged deployment too.
The function was previously trying to avoid operating on an
instantiated deployment, but the data we need is in the deployment
object at that point.
Closes: #1538
Approved by: jlebon
Add API to write a deployment state to `/run/ostree/staged-deployment`,
along with a systemd service which runs at shutdown time.
This is a big change to the ostree model for hosts,
but it closes a longstanding set of bugs; many, many people have
hit the "losing changes in /etc" problem. It also avoids
the other problem of racing with programs that modify `/etc`
such as LVM backups:
https://bugzilla.redhat.com/show_bug.cgi?id=1365297
We need this in particular to go to a full-on model for
automatically updated host systems where (like a dual-partition model)
everything is fully prepared and the reboot can be taken
asynchronously.
Closes: https://github.com/ostreedev/ostree/issues/545Closes: #1503
Approved by: jlebon
I was looking at this code in prep for "staging" deployments,
and there are several cleanups to be made here. The first
thing I noticed is that we look for the `ostree=` kernel argument,
but the presence of that should be exactly equivalent to having
`/run/ostree-booted` exist. We just added a member variable for
that, so let's make use of it.
Related to this, we were erroring out if we had the karg but
didn't find a deployment. But this can happen if e.g. one is
using `ostree admin --sysroot` from an ostree-booted system! It's
actually a bit surprising no one has reported this so far; I guess
in the end people are either using non-ostree systems or running
from containers.
Let's add a member variable `root_is_sysroot` that we can use
to determine if we're looking at `/`. Then, our more precise
"should find a booted deployment" state is when both `ostree_booted`
and `root_is_sysroot` are TRUE.
Next, rather than walking all of the deployments after parsing,
we can inline the `fstatat()` while parsing. The mild ugly
thing about this is assigning to the sysroot member variable while
parsing, but I will likely clean that up later, just wanted to avoid
rewriting everything in one go.
Closes: #1497
Approved by: jlebon
If we're booted into a deployment, then any queries for the pending
merge deployment of a non-booted OS will fail due all of them being
considered rollback.
Fix this by filtering by `osname` *before* determining if we've crossed
the booted deployment yet.
Closes: #1472
Approved by: cgwalters