- As pointed out by Michael Schroeder in
http://lists.rpm.org/pipermail/rpm-maint/2013-September/003605.html,
the dummy entries used for optimizing rpmstrPoolStrlen() are
problematic in number of ways:
- Walking the id's in a pool is unreliable, and rehashing can cause
bogus empty strings to be added to a pool where they otherwise
do not exist
- rpmstrPoolNumStr() is not accurate when more than one chunk is in use
- Unfortunately this means giving up the rpmstrPoolStrlen() optimization,
for now at least.
- Just a simple s/sidpool/pool/ to match the implementation, "sidpool"
is a leftover from early draft version that somehow made its way
to the master tree.
- As a special case, two strings (ids) from the same pool can be tested for
equality in constant time (integer comparison). If the pools differ,
a regular string comparison is needed.
- Pool id -> string always works with a frozen pool, but in some cases
we'll need to go the other way, allow caller to specify whether
string -> id lookups should be possible on frozen pool.
- On glibc, realloc() to smaller size doesn't move the data but on
other platforms (including valgrind) it can and does move, which
would require a full rehash. For now, just leave all the data
alone unless we're also freeing the hash, the memory savings
isn't much for a global pool (which is where this matters)
- The pool stores "arbitrary" number of strings in a space-efficient
manner, with near constant (hashed) string -> id lookup/store and
constant time id -> string and id -> string length lookups.
- Credits for the idea go to the Suse developers working on libsolv,
the basic concept is directly lifted from there but details
differ due to using rpm's own hash table implementation etc.
Another minor difference is using size_t for offsets to permit over
4GB total data size on 64bit systems, the total number of id's in
the pool is limited to uint32 max however (like in libsolv).
- Any (re)implementation bugs by yours truly, this is almost certainly
going to need further tuning and tweaking, API and otherwise.