This can be a large performance win in certain circumstances:
* Cold buffer cache (we don't block the whole process)
* Requiring a copy instead of hardlink
The previous fix to just ignore symbolic links for hard linking isn't
really good enough, since it can happen for empty files too.
Since this is an optimization, when we get EMLINK, let's instead just
fall back to copying. This also applies to EXDEV.
Rather than always doing:
1) make temporary link
2) unlink() target
3) rename()
Just try making the link, and only do the second two if the file
already exists. This reduces system call traffic a lot.
By default, when doing a commit, scan all of our loose objects and
build up a (device,inode) -> checksum hash. Then when we're doing a
commit, if we see a file with that (device,inode) pair, we can avoid
checksumming it.
This will allow us to use hard links again for user-mode checkouts,
rather than the hackish link cache. It was pretty silly anyways to
have file objects be stored with just a small metadata header
prepended, but uncompressed.
Either they should be hardlinkable, or compressed (in pack files).
Rather than passing xattr/file_info for all objects, change the API to
assume we're passing the defined object stream for each type. Namely,
for OSTREE_OBJECT_TYPE_FILE, we're now giving the "archive file" data.
This significantly cleans up the code for committing to archive mode
repositories, at the cost of having to (at present) create an
intermediate temporary file when committing to raw repositories.
This will be useful for ostbuild; a user can create their own archive
mode repository which transparently inherits objects from the
root-owned one in /ostree.
This is a convenient way to have a lookaside directory of hard links,
which can greatly speed up checkouts. In the future we probably want
to push this down into the repository.
Having the archived vs not distinction in the object system wasn't
useful in light of pack files. In fact, we should probably move
towards generating a pack file per commit by default.
Don't expose GChecksum in APIs. Add a new stream class which allows
us to pass an input stream somewhere, but gather a checksum as it's
read.
Move some bits of the internals towards binary csums.