folkslinux/rpm - rpm - Trustie: Git with trustie

Commit Graph

Author	SHA1	Message	Date
Panu Matilainen	a162cf10df	Take advantage of C++ native mutex facilities for string pool Shared and exclusive locks are of different types in STL so we can't easily return one or the other from poolLock() as per the write argument. Just convert all poolLock() calls sites to name their lock type locally.	2024-09-24 10:08:38 +02:00
Panu Matilainen	62840a3cdf	Add casts that C++ requires but C doesn't across librpmio In other words, a whole lot of "yes, really".	2024-04-09 11:00:00 +03:00
Panu Matilainen	747b7119ae	Fix possible read beyond buffer in rstrnlenhash() On strings that are not \0-terminated (which are a big reason for the existence of this function), the while-loop would try to compare the first character beyond the specified buffer for '\0' before realizing we're already beyond the end when checking n. Should be mostly harmless in practise as the check for n would still terminate it, but not right. In particular this trips up address sanitizer with the bdb backend where some of the returned strings are not \0-terminated. Test for string length first, and move the decrementing side-effect into the loop for better readability.	2020-09-04 13:05:38 +03:00
Panu Matilainen	7ffc4d17ff	Implement thread protection locking on the string pool The shared string pool is in a very central role in several operations, it's kinda embarrasing that we haven't had any thread protection on it. Not that anybody has asked either, prior to coming up as part of #226 (to enable threaded package creation). Test-suite and couple of smoke-tests with the #226 pass, but only lightly tested. Then again, it's relatively straightforward. As a general rule, locks are taken on all exported interfaces on entry and released on exit, internal callers never lock anything. In rpm usage at least, performance hit seems negligible.	2018-09-28 15:32:55 +03:00
Panu Matilainen	613842841f	Un-oneline rpmstrPoolNumStr() for next step	2018-09-28 14:24:33 +03:00
Panu Matilainen	8e90ea931c	Use helper variables for pool streq comparison No functional changes, but will be necessary for things to come.	2018-09-28 13:49:36 +03:00
Panu Matilainen	0a6cfc17a1	Add + use an internal helper for id -> string retrieval Ensure non-NULL pool on the outer call, internal callers don't need that. No functional changes, just refactoring for things to come	2018-09-28 13:47:58 +03:00
Panu Matilainen	4d67755941	Check for NULL pool on the outer callers No functional change, just minor refactor for things to come	2018-09-28 13:02:39 +03:00
Panu Matilainen	f2d5e7ecd7	Clarify a couple of comments	2013-12-02 13:13:00 +02:00
Panu Matilainen	25406a133f	Track chunk usage in the pool struct directly - This simplifies things a bit as we dont need to worry about the id storage and the starting location of the next string in advance. - Also make it clearer the string is copied into the current chunk, to which pool->offs only points to. Make pool->offs const to enforce the strings are never written through it.	2013-12-02 12:47:23 +02:00
Panu Matilainen	938b86b8bd	Clarify pool chunk allocation - Assign newly alloc'ed chunks to pool->chunks, pool->offs just contains pointers into the chunks. This doesn't change actual behavior at all, just (IMO) clarifies the code a bit.	2013-12-02 12:29:21 +02:00
Panu Matilainen	c24930219a	Fix a harmless off-by-one in rpmstrPoolPu() - ssize already has the trailing \0 accounted for	2013-12-02 10:54:18 +02:00
Panu Matilainen	cfe99e08ad	Drop the end-of-chunk dummy entries from string pool - As pointed out by Michael Schroeder in http://lists.rpm.org/pipermail/rpm-maint/2013-September/003605.html, the dummy entries used for optimizing rpmstrPoolStrlen() are problematic in number of ways: - Walking the id's in a pool is unreliable, and rehashing can cause bogus empty strings to be added to a pool where they otherwise do not exist - rpmstrPoolNumStr() is not accurate when more than one chunk is in use - Unfortunately this means giving up the rpmstrPoolStrlen() optimization, for now at least.	2013-12-02 10:45:33 +02:00
Michael Schroeder	41a01d2563	Fix off-by-one in rpmstrPoolRehash() - pool->offs_size is the last used id, thus it should be "<=" instead of "<" Signed-off-by: Panu Matilainen <pmatilai@redhat.com>	2013-11-29 10:42:36 +02:00
Ville Skyttä	8002b3f985	Spelling fixes. Signed-off-by: Panu Matilainen <pmatilai@redhat.com>	2013-02-19 21:35:40 +02:00
Panu Matilainen	e3ed69591f	Missing include in string pool - When compiled without selinux support, stdlib.h doesn't get included here. Wtf?	2012-10-11 15:14:48 +03:00
Florian Festi	bdb966b4df	Make string pool strings static in memory - Use multiple chunks that get allocated as old ones get filled up instead of reallocating, store direct pointers to the strings in the id array. - This prevents nasty surprises when previously retrieved pointer to a pool string goes invalid underneath us due to somebody adding a new string, and restores former rpm API behavior: string pointers retrieved from eg rpmds and rpmfi are valid for the entire lifetime of those objects.	2012-09-28 10:37:05 +03:00
Panu Matilainen	1bbf25b78f	Add function to get number of unique strings in the pool	2012-09-26 08:34:40 +03:00
Florian Festi	971a2887f8	Change poolHash to use internal collision resolution	2012-09-19 13:31:13 +02:00
Panu Matilainen	3619df6ebb	Aargh, stupid thinko in rpmstrPoolStrlen() last id special case - At the largest id, the end boundary is data, not offset size... doh	2012-09-19 10:49:16 +03:00
Panu Matilainen	4c75ab28b8	Make pool string->id operations properly length-aware - Allow looking up and inserting partial key strings, this is useful in various cases where previously a local copy was needed for \0-terminating the key in the caller. - Take advantage of rstrlenhash() in rpmstrPoolId(), previously the length was only interesting when adding so we wasted a strlen() on every call when the string was already in the pool.	2012-09-18 06:11:37 +03:00
Panu Matilainen	0927ab855e	Add length aware variant(s) of string hashing - Being able to hash partial strings is needed for allowing string pool to operate on partial strings...	2012-09-18 04:47:01 +03:00
Panu Matilainen	76a699701c	Enhanced string hash to permit calculating string length on the same call - String hashing needs to walk the entire string anyhow, might as well take advantage of this and have it return the string length to avoid having to separately call strlen() in the cases where this matters. - Move the implementation into rpmstrpool.c for inlining possibilities, rstrhash() is now just a wrapper to rstrlenhash(). The generic hash implementation could not take advantage of this anyway really.	2012-09-18 04:40:20 +03:00
Panu Matilainen	bef4be688d	Dont assume \0 terminated strings in rpmstrPoolPut() - Before this, the slen argument was only good for avoiding an extra strlen() but being able to handle shove and lookup partial strings without local copy+modify in callers is handy, this is one of the prerequisites for that.	2012-09-18 04:15:56 +03:00
Panu Matilainen	1abd80f9c2	Use pool id's for hash table key, lookup strings from pool as needed - The pool itself can address its contents by id alone, storing pointers to the strings only hurts as reallocation moving the data blob requires rehashing the whole thing needlessly. - We now store just the key id in the hash buckets, and lookup the actual string for comparison from the pool. This avoids the need to rehash on realloc and saves memory too, and this is one of the biggest reasons for wanting a separate hash implementation for the string pool. Incidentally, this is how libsolv does it too. - Individual bucket allocation becomes rather wasteful now: a bucket stores a single integer, and a single pointer to the next bucket, a pointer which can be twice the size of the key data it holds. Further tuning and cleaning up after the marriage of these two datatypes left after the honeymoon is over...	2012-09-17 15:52:59 +03:00
Panu Matilainen	7cb0a71a11	Move the string pool struct definition earlier so we can reference it...	2012-09-17 15:32:57 +03:00
Panu Matilainen	77392704f3	Inline poolHashfindEntry() into GetEntry(), nothing else needs it	2012-09-17 15:18:01 +03:00
Panu Matilainen	533106ccfe	Eliminate key comparison and hash function vectors from poolHash - As the pool is hardwired to single hash type, these dont make any sense here and the extra indirection will only hurt performance.	2012-09-17 15:14:08 +03:00
Panu Matilainen	38fe7e3b47	More poolHash multiple data-value cleanups - The only data associated with a pool key is a single id, we dont need an array for that - Change poolHash get-entry return the id directly instead of pointer array	2012-09-17 14:48:21 +03:00
Panu Matilainen	46b664b11b	Eliminate redundant data counting from poolHash - There's a strict 1:1 relation between keys and data in the string pool, this keeping count of data is pointless.	2012-09-17 14:43:43 +03:00
Panu Matilainen	95794632be	Eliminate unnecessary key and data free-functionality from poolHash	2012-09-17 14:30:55 +03:00
Panu Matilainen	d9d9fecaef	Pull a private hash-implementation copy to string pool - The string pool is more specialized a data structure to be efficiently handled with the generic hash table implementation in rpmhash.[CH] and really requires quite a different approach. - For starters, import a private copy generated roughly with: gcc -E -DHASHTYPE=poolHash \ -DHTKEYTYPE="const char *" -DHTDATATYPE=rpmsid rpmhash.C ...and clean it up a bit: eliminate unused functions (except for stats which we'll want to keep for debug purposes), make remaining functions static and overall tidy up from the mess 'gcc -E' created. Lots of redundant fluff here still, to be cleaned up gradually... - This doesn't change anything at all, but opens up the playground for tuning the pool hash implementation in ways the generic version could not (at least sanely) be.	2012-09-17 14:27:01 +03:00
Panu Matilainen	72d0735b90	Rename string pool hash type to poolHash - No changes other than a rename for next steps...	2012-09-17 13:33:42 +03:00
Panu Matilainen	241fc3c143	Lift string pool rehash into a separate helper function - This way we have exactly one place for controlling hash (re)creation size strategies etc.	2012-09-15 13:01:53 +03:00
Panu Matilainen	95329e10be	Use a saner pool hash resize hint - The previous size hint would actually cause us to shrink the hash bucket allocation, requiring the hash to resize itself immediately afterwards. As if the rehashes weren't expensive enough already...	2012-09-15 12:49:15 +03:00
Panu Matilainen	1e2c2fece2	Add a string equality check function to string pool API - As a special case, two strings (ids) from the same pool can be tested for equality in constant time (integer comparison). If the pools differ, a regular string comparison is needed.	2012-09-13 09:01:30 +03:00
Panu Matilainen	2ea2a0961f	Only rehash the pool on insert if the data area actually moved - realloc() might not need to actually move the data, and when it doesn't we dont need to do the very expensive rehash either. Unsurprisingly makes things a whole lot faster.	2012-09-12 19:29:28 +03:00
Panu Matilainen	0654685493	Allow keeping hash table around on pool freeze, adjust callers - Pool id -> string always works with a frozen pool, but in some cases we'll need to go the other way, allow caller to specify whether string -> id lookups should be possible on frozen pool. - On glibc, realloc() to smaller size doesn't move the data but on other platforms (including valgrind) it can and does move, which would require a full rehash. For now, just leave all the data alone unless we're also freeing the hash, the memory savings isn't much for a global pool (which is where this matters)	2012-09-12 19:17:20 +03:00
Panu Matilainen	3226c2073a	String pool id 0 equals NULL - Pool id 0 is special case for "not found". Return an actual NULL instead of an empty string.	2012-09-12 13:33:22 +03:00
Panu Matilainen	bed3880ef1	Avoid doing anything if pool is already frozen	2012-09-12 13:30:50 +03:00
Panu Matilainen	51f1cff50d	Fix segfault on rpmstrPoolId() on frozen pool - String -> id lookups need the hash table in place even if we're not adding. We could do a linear search in such a case but...	2012-09-11 10:22:18 +03:00
Panu Matilainen	00deac224c	Make rpmstrPoolUnfreeze() safe to call on unfrozen pool	2012-09-11 09:01:49 +03:00
Panu Matilainen	09373ec03a	And now, on to the embarrassing string-pool reimplementation bugs, take I - String pool offset resize was off by one, oops - String pool data-area resize requires rehashing all the strings, as the key pointers change. Ouch. Should be avoidable by extending rpmhash to allow passing the pool itself around in comparisons as "self" and using offsets as keys, but for now working counts more than speed. - The unfreeze-sizehint calculation could be negative. Turn the initial size into constant and use that as a minimum, otherwise rehashing uses (more or less arbitrary) heuristics to come up with some number. Lots of fine-tuning ahead...	2012-09-09 13:04:55 +03:00
Panu Matilainen	9e47043b2d	First cut of a libsolv-style string <-> id pool API - The pool stores "arbitrary" number of strings in a space-efficient manner, with near constant (hashed) string -> id lookup/store and constant time id -> string and id -> string length lookups. - Credits for the idea go to the Suse developers working on libsolv, the basic concept is directly lifted from there but details differ due to using rpm's own hash table implementation etc. Another minor difference is using size_t for offsets to permit over 4GB total data size on 64bit systems, the total number of id's in the pool is limited to uint32 max however (like in libsolv). - Any (re)implementation bugs by yours truly, this is almost certainly going to need further tuning and tweaking, API and otherwise.	2012-09-07 13:34:27 +03:00

44 Commits