- As a special case, two strings (ids) from the same pool can be tested for
equality in constant time (integer comparison). If the pools differ,
a regular string comparison is needed.
- realloc() might not need to actually move the data, and when it
doesn't we dont need to do the very expensive rehash either.
Unsurprisingly makes things a whole lot faster.
- Pool id -> string always works with a frozen pool, but in some cases
we'll need to go the other way, allow caller to specify whether
string -> id lookups should be possible on frozen pool.
- On glibc, realloc() to smaller size doesn't move the data but on
other platforms (including valgrind) it can and does move, which
would require a full rehash. For now, just leave all the data
alone unless we're also freeing the hash, the memory savings
isn't much for a global pool (which is where this matters)
- String pool offset resize was off by one, oops
- String pool data-area resize requires rehashing all the strings,
as the key pointers change. Ouch. Should be avoidable by extending
rpmhash to allow passing the pool itself around in comparisons as "self"
and using offsets as keys, but for now working counts more than speed.
- The unfreeze-sizehint calculation could be negative. Turn the initial
size into constant and use that as a minimum, otherwise rehashing
uses (more or less arbitrary) heuristics to come up with some number.
Lots of fine-tuning ahead...
- The pool stores "arbitrary" number of strings in a space-efficient
manner, with near constant (hashed) string -> id lookup/store and
constant time id -> string and id -> string length lookups.
- Credits for the idea go to the Suse developers working on libsolv,
the basic concept is directly lifted from there but details
differ due to using rpm's own hash table implementation etc.
Another minor difference is using size_t for offsets to permit over
4GB total data size on 64bit systems, the total number of id's in
the pool is limited to uint32 max however (like in libsolv).
- Any (re)implementation bugs by yours truly, this is almost certainly
going to need further tuning and tweaking, API and otherwise.
- Check for lowercase letters before uppercase. A very minor difference
as such, but our file digests use lowercase hex and this gets
called a lot from rpmfiNew().
- This reverts commit 4c1f7e335de1724661ce63c53186d161ab71a63f:
various things inside and outside of rpm actually do still depend
on the old behavior, and leak file descriptors otherwise.
As an easy backportable band-aid, revert back to the previous
behavior, to which various callers are tuned to fix the regression
introduced in rpm 4.10.0. The real fix would be something more like
"eliminate fdFree() and make Fclose() honor refcounts".
- Commit 4cb02aa928 asked to see
what breaks when mmap() is used, now we know: large package support
broke when enabling it. Could be fixed of course by eg adding
a size cap to the fsm part as well, but just doesn't seem worth it:
I fail to measure any meaningful performance improvement from mmap
usage in either case, and added complexity for what is close to
zero benefit just doesn't make sense... and various sources in fact
note the rpm usage (read through the entire file sequentially) as one
of the cases where mmap() is NOT beneficial due to mmap() high
setup + teardown cost + page fault speed (or lack of thereof).
- Lump glob.h and glob.c into rpmglob.c in all their g(l)ory libc
decorations and make everything static to stop overriding system
library symbols with our own glob().
Teach %prep and %uncompress how to handle 7zip tarballs, with
the mingw toolchain landing in fedora, this may be useful when
crossbuilding Windows sources compressed using 7zip (CxImage is
one such project).
- This isn't strictly needed as we're terminating the buffers "just in
case" all over the place but handling this centrally might allow
some day eliminating the other fluff...
- Oops, remember to reserve space for the trailing \0 when appending.
mb->nb holds the number of actual characters left in the buffer,
not the terminator. Fixes a regression introduced in rpm 4.9.x
dynamic macro realloction work (RhBug:431009 reprise)
- Up to now, if the fd had remaining references fdFree() would return
the supposedly free'd fd back to us, which is unlike anything else
in rpm. Make this consistent with the rest of rpm finally as the
last remaining caller requiring the old semantics is gone from
the codebase (somewhere between 4.9 and 4.10): always return NULL,
as the referenced instance is now gone as far as the caller is concerned.
- Fix regression from commit 807b402d95,
the array gets passed as a pointer (how else would it work at all),
so despite having seemingly correct type, sizeof(keyid) depends
on the pointer size. This happens to be 8 on x86_64 and friends
but breaks pgp fingerprint calculation on eg i386.
- Also return the explicit size from pgpExtractPubkeyFingerprint(),
this has been "broken" for much longer but then all callers should
really care about is -1 for error.
- This is stupid... only librpm and librpmio actually need the bump due
to ABI breakage, librpmbuild and librpmsign are unchanged and could
use just a revision bump. But just incrementing the revision (or age)
would set us on collision course with maintenance updates to 4.9.x.
Then again its not like you can actually use librpmbuild or librpmsign
without also linking to librpm(io) so from everything needs rebuilding
anyway. This all also pretty much makes the whole libtool library
versioning a bit moot. Bah.
- Commit 70f063cb77 accidentally
changed lua's base64 encode/decode interface too, ugh. Dangers of
search-and-replace... Only the function name string exported to
lua matters but renaming the internal functions back as well
for naming consistency.
- Files can be (much) larger than INT32_MAX, change the return
type to off_t and fix + simplify the calculations. Fixes the other
half of RhBug:790396 and makes ufdCopy() usable for other purposes too.
- Base64 is present in headers and all, it's only reasonable that
our API users have access to this functionality without having
to link to other libraries. Even if we didn't want to carry the
implementation forever in our codebase, we should provide a wrapping
for this (much like the other crypto stuff) for the reason stated above.
- A bigger issue is that our dirty little (badly hidden) secret was using
non-namespaced function names, clashing with at least beecrypt. And we
couldn't have made these internal-only symbols even on platforms that
support it, because they are used all over the place outside rpmio.
So... rename the b64 functions to rpmLikeNamingStyle and make 'em public.
No functional changes, just trivial renaming despite touching numerous
places.
- At least within rpm itself, callers aren't particularly interested
in the actual key that matches a given signature, they just want
simple good/bad/nokey answers. This makes life simple for them
and avoids exposing further rpmPubkey internals through APIs.
- Document the broken rpmKeyringLookup() behavior / side-effect,
the new helper uses the values from our stored pgp parameters though.
- Shouldn't make any difference functionality-wise, but we'll need
the helper function shortly.
- Yet more pre-requisites for separating key and signature management.
In addition this gains us more thorough initial sanity checking and
will allow reusing the parameters instead of having to parse
the same packets over and over again on every single verification
against this key. Unfortunately rpmKeyringLookup() is so braindead
it prevents us from doing this right now, we'll need a better
interface to take advantage of the stored pgp key parameters.
- pgpPrtParams() returns a pointer to an allocated pgpDigParams
on success, eliminating the need for callers to worry about
freeing "target buffer" on failure and bypassing the now rather
useless pgpDig middleman. Also allows specifying the expected
packet type so if we expect a key we'll error out if we get a signature
instead.
- pgpPrtPkts() is basically just a wrapper to pgpPrtParams()
- Further pre-requisites for separating key and signature management.
- Yes, pgpPrtParams() is a stupid name for this. However all the saner
ones are already taken for other purposes (for which the names are
just as bad/misleading, sigh)
- This way we can parse the whole thing into a private storage first
and only if its actually successful we return anything through the
pgpDig. Previously we would return partial garbage on failure
and/or consecutive calls unless manually "cleaned" as we were
parsing directly into the pgpDig.
- Dynamic allocation is a pre-requirement separating management of
keys and signatures: while they walk hand in hand much of the time,
they come from different sources and have different lifetimes and
should be managed separately.
- Dynamic allocation of these is also a pre-requirement for handling
more than one public key, ie mainly subkeys.
- Besides eliminating a couple of direct struct accesses,
pgpDigParamsCmp() does a much more thorough job of comparing
the parameters than we ever did here (ie less chance for returning
ok for for a wrong key, although because the interface is as
braindead as it is, it doesn't make a whole lot of difference)