Having everything accessible to everything encourages fast and loose
includes from places one shouldn't be using, and makes it easy for
those cases to hide in plain sight as well. There were reasons for
the top-level include back in 2007 but our codebase is a rather
different beast these days. Limiting access through per-target
include directories on everything nicely highlights the exceptions
and makes the whole more controllable and manageable.
This change looks huge, but it's just due to stripping no longer valid
prefixes from all the gazillion internal includes. No rpm-side
functionality is affected, this is just source-level hygiene operation.
How does the number of writes change?
I compared number of writes with and without minimal writing
for upgrade from RHEL 7.0 to RHEL 7.1, from RHEL 7.1 to RHEL 7.2, ...
In more details, the size of memory used for writing regular files.
The following table contains in the second column percentages of files
which can be only touched from all installed files. In the third column,
there are percentages of sum of their sizes.
| Touched | Touched
| Updated files | Updated bytes
------------------------------------------------------------------------
RHEL 7.0 -> RHEL 7.1 | 63 % | 66 %
RHEL 7.1 -> RHEL 7.2 | 53 % | 43 %
RHEL 7.2 -> RHEL 7.3 | 60 % | 42 %
------------------------------------------------------------------------
F24 -> F25 | 63 % | 40 %
F25-> Fraw | 49 % | 28 %
Does the speed change?
The update speed for classical disks or SSD almost does not change.
How it works?
If there is a file in the new package, which has the equal digest
as the corresponding file in the old package and the same file on
the disk (thus the contents are expected to be equal), rpm will not
install the whole file, but rather only upgrade the file's meta data.
In other cases, it will install the whole file.
- All the public rpmfiFoo() accessors have an indexed rpmfilesFoo()
counterpart, make the rpmfiles-versions public too.
- The noteworty exceptions are rpmfiDecideFate() and rpmfiConfigConflict()
which shouldn't have been public in the first place, and are to be
removed from the API in the next API break. So we're not adding
new rpmfiles-counterparts for functions that are to be removed
from the (public) API. Actually document the issue by deprecating
both rpmfi-functions.
- The iterator types need to be in rpmfiles.h as the iterator
constructor is there (otherwise there'd be a cyclic include
between rpmfiles.h and rpmfi.h, which wont do...)
* Write headers while iterating over the files
* Handle hard links within the Iterator
* Remove rpmfiAttachArchive()
* Remove rpmfiArchiveWriteHeader() from the API
- There's no benefit to using char here, and int is what is generally
used for such (flag-type) arguments.
- As it happens char doesn't even work here as the nodigest argument
passed from fsm is not a simple 0/1 value but result of
(rpmtsFlags(ts) & RPMTRANS_FLAG_NOFILEDIGEST), which when crammed
into a char actually turns into a zero... so this unbreaks
--nofiledigest
- Get rid of the crazy tag-tango around rpmfi in genCpioListAndHeader():
pkg->cpioList is an rpmfi with the actual in-package paths, and
on-disk package paths are passed around as a separate array. This
simplifies and sanitizes things a lot... and also finally gets rid
of fi->apath entirely.
- Dependency generation wants on-disk paths, but it can generate those
by prepending buildroot, which actually makes it more obvious what's
going on.
- This also kills %{_noPayloadPrefix} ancient-history compatibility flag.
We could honor it in rpmfiArchiveWriteHeader() if we cared but we're
talking about rpm < 3.0.5 compatibility here ... so we dont.
- Especially rpmfiOFN() and its underlying buffer is more than a bit
dubious, but can't really help it as people are going to expect
it behaving identically to rpmfiFN(). And they can't share the
same buffer as somebody is going to be tempted to do eg
if (strcmp(rpmfiFN(fi), rpmfiOFN(fi)))
/* file has been relocated */
- Based on ffesti's original work on the subject
- Forward (the default) and (a new) backwards iteration modes for now
- Internal iterator function just returns the next iteration index,
bounds checking and actual advancing in rpmfiNext() to avoid having
to duplicate that code in every single iterator function.
- These all operate on rpmfiles, not rpmfi, now so make the point
clearer. All internal stuff so we're free to mess around.
- No functional changes, only a straightforward perl-assisted rename...
- The self-iterator in rpmfi prevents all sorts of sane uses of
file set iteration. Split the actual data into a separate data
type, changing the internal random-access functions to use the
new rpmfiles datatype directly and update internal callers minimally.
This should be entirely transparent to public API consumers who still
only see the braindamaged self-iterating version.
- Internal consumers dont directly benefit (yet), this is just an
early step on the road to rpmfi sanity. Much of the API and variables
will need renaming etc but avoiding that here to keep the changes
to minimum, this is a rather huge commit as it is.
- Similar as commit 541234b02e for rpmds,
but speed is less of an issue with the complex rpmfi's than single
rpmds'es where private pool construct+teardown can be very expensive.
With rpmfi's the bigger gain from shared pools is memory savings,
permit taking advantage of this outside librpm internals.
- In the package/transaction related things the strpool is more of
an internal implementation detail than an end-goal in itself, move
string pool related interfaces of rpmts, rpmfi and rpmds to
internal-only APIs for now. The kind interfaces we'll want to eventually
export a) dont exist yet and b) are likely to be something very different.
- The string pool itself remains exported however, its a handy data
structure for all sorts of things and both librpm and librpmbuild
heavily use it already.
- rpmfi cannot know anything about the storage, so rpmfiFpxIndex()
cannot be... change it to rpmfiFps() which only returns the pointer
we got from fpLookupList()
- Change fpCacheGetByFp() to assume it gets passed an array of fps,
and take an additional index argument. Return the fingerprint
pointer on success, NULL on not found to allow further operations
on the fp without knowing its internals.
- Always push base and dir names into file info sets string pool,
whether private or shared. For basenames, this can save significant
space even in a private pool, for dirnames private pool is moot
as the names are already unique, shared pool is quite another story.
- Adjust fpLookupList() to take a pool and id's as arguments.
- This introduces a fair amount of overhead, so things will be somewhat
slower until the transition to pool id's is (more) complete. Sometimes
things have to get worse before they get better... Other than that,
this should be entirely invisible to callers.
- Removes the last use of our former simple, stupid and slow caches
- For now, use a per-fi pool for this just like the previous caching
did. Memory use is slightly increased but its faster than before,
to reap the full benefits (memory and otherwise) we'll want a
per-transaction pool for these, to be added later.
- With the string pool we dont have to worry about overflowing the
indexes so we can lump all this relatively static data into one pool.
Because rpmsid's are larger than the previous cache indexes, we'll
loose some of the memory savings, but then the pool is faster on
insertion, and we'll only need one of them so...
- The misc. pool is never freed, flushed or frozen so it'll "waste" memory
throughout the lifetime of a process (similarly to the previous caches)
but its not huge so .. ignoring that for now.
- Very few packages have RPMTAG_FILECAPS at all, and the memory saving
for those that do is so marginal it hardly matters at all. At least
for now, dont bother.
- Further preliminaries to handle file conflicts within a package.
- These are internal-only interfaces so we can just change without
bothering with compat wrappers.
- Preliminaries for handling file conflicts within a package:
Using rpmfi's self-iterator limits access to the file info to
one caller at a time, in order to self-file conflicts we'll need
to be able to access the same rpmfi at different indexes simultaneously.
- As these are public API's, add compat wrappers for the self-iterator
use (although AFAIK nothing except rpm itself uses these)
- Previously this would return a pointer to an internal per-rpmfi buffer
whose contents get silently overwritten on each call to rpmfiFNIndex(),
making it unsafe for unsafe for random access for more than one
active caller (such code does not currently exist in rpm though)
- Make rpmfiFNIndex() always return freshly allocated memory, and adjust
the rpmfiFN() iteration wrapper to free and realloc the internal
"buffer" on each call. It's a wee bit slower than before but it's
not called *that* much, and if needed there are ways to optimize it.
- Similar in spirit to PSM blackbox treatment in
commit df9cdb1321, except that
technically fsm guts are still wide-open in fsm.h due to cpio
"needing" them (yuck).
- Allows getting rid of dumb a**-backwards things like rpmfiFSM()
which is just not needed, fsm is a relatively short-lived entity
inside psm and build, nobody else needs to bother with it except
for the returned results.
- Figure out the cpio map flags in fsmSetup() where it logically belongs,
we have all the necessary info available there.
- Get rid of newFSM() and freeFSM(), we can just as well place the
fsm on stack, merge the necessary cleanup bits from freeFSM()
into fsmTeardown()
- Supposedly no functional changes, knock wood.
- rpmfi itself doesn't need it for anything, its only really used
for progress reporting during install. Grab the size into psm
total directly, this is already passed down to fsm.
- Removes one of the last remaining rpmfi opacity violations, just
fi->apath to go...
- practially all the data in rpmfi needs to be treated as const, these
are just a funky special case which point to header memory for the
couple of cases where KEEPHEADER is still used