We already filter out -EOPNOTSUPP and return OK, but the message was
getting logged before the filtering so we'd spit out spurious error
messages on filesystems that don't support SELinux (RhBug:1777502)
Spinning new cursors for each key value is just mind-bogglingly stupid
(if simple), use a sub-cursor with key binding instead. Which isn't any
more complicated, really.
Caching statements is not stupid but when it makes us go slower...
Also without the cache we have a better handle on locks and all,
revisit the caching issue later.
Rpm can reuse an iterator cursor for a write when rewriting an existing
header (such as when updating file states of already installed package),
which in sqlite terms turns a SELECT into INSERT, destroying the
iteration in progress.
Detect the condition and use a local cursor as necessary, this also means
we don't need to track the query fmt string in cursors.
In normal transactions this is but a drop in the ocean. However on
rpm -Va, the hashes get rebuilt from scratch on every single package
which on my moderate rpmdb (~2800 packages) testcase results in
103418347 data values fetched and added to dbi sets that are only
thrown away.
With bdb and lmdb this is only a minor optimization but for ndb and sqlite
which can retrieve keys independently, this is a much bigger win. In case
of sqlite, it's a massive one.
The regular index iterator grabs the associated data too, which we
don't always need. The data associated with indexes is relatively
lightweight, but as with everything, it adds up if in the millions scale.
Update all backends to allow for NULL set in the index retrieve to
signal key-only retrieval. Ndb actually had an earlier, abandoned
implementation of the same idea under slightly different API, lets
reuse the code-block.
If we're using libgcrypt for hashing we need to initialise libgcrypt as
otherwise it is not thread-safe. Without this it will crash when used
in parallel packaging runs.
Fixes#968
as it is not POSIX compliant and not supported on BSD and others
Dropping the use of grep here and using -regex of find instead to get rid of
the need for looking at lines terminated by newline.
Using only basic regular expressions for portability.
Resolves: #948
Postpone sql index creation until closing time on database rebuilds,
this is more than 50% improvement on rebuilddb time on both SSD and
traditional disks.
Store the file type strings in the classifier, and generate the dictionary
and its ids serially after the parallel section completes to ensure
stable order. Besides making the classifying really run in parallel
again, this also moves the pool- and file-counting related constraints
out of the parallel section for theoretically better parallelization.
Fixes#934
The order directive might be useful in some cases, but for our purposes
it very effectively serializes the whole classification operation.
Which means that we get the speed of serial classification with the
complexity of parallel execution, ugh. Revert, we need a better fix.
This reverts commit 3691d99c8b.
As of sqlite 3.22.0, a database in WAL mode can be opened readonly
if one or more of the following is true:
1) The -shm and -wal files already exists and are readable
2) There is write permission on the directory containing the database
so that the -shm and -wal files can be created.
3) The database connection is opened using the immutable query parameter.
Regular users running queries cannot have permission to create, and
immutable databases can only exist on readonly media (because all locking
is disabled) so there's no choice but to leave the -shm and -wal files
around at all times, a little ugly as it might be.
Handle read-only database separately for table and index initialization
as they are different: if table cannot be created we're screwed and
should return an error, but if an index is missing it'll still work.
Also only return an allocated dbi if everything went okay.
libgcrypt from 1.8.5 provides a pkg-config file as well as the traditional
libgcrypt-config script. As pkg-config is more resiliant in the face of
complicated build environments (for example cross-compilation and sysroots)
prefer the pkg-config file, falling back to libgcrypt-config if that doesn't
exist.
While flushing IO constantly is horrible for performance on spinning
disks, it's beneficial on SSDs. So if %_flush_io is enabled, let sqlite
do its regular WAL auto-checkpointing. Pure DB install time on SSD
improves by some 25% here with %_flush_io, at cost of removal speed but
for overall sanity and smaller WAL file.
This is an enormous performance boost to almost all operation modes,
in particular spinning disks - on my old laptop, time to erase ~3300
packages comes down from half an hour to under three minutes.
By default sqlite likes to keep the WAL file relatively slow to keep
reader performance up, but as writes are rare in rpm, we let the WAL
grow unlimited during transactions and only flush it at database
close. This way transactions are fast and at the cost of read
performance during transactions, which is perfectly okay. To do this
we need to move statement cache free earlier to drop any locks that
might be lurking there.
Special handling of RPMDB_FLAG_REBUILD is dropped here as unlike other
modes, WAL is persistent and has interactions with EXCLUSIVE locking_mode
among other things that seem non-trivial to sort out. This has some
performance cost to --rebuilddb but it's a rare operation and normal
transaction performance is far more important. We can always tweak
it more later on...
Only create key indexes for strings, those are the ones that are
performance critical and anything else is just unnecessary weight.
Additionally create indexes on hnum for better performance especially
on erasure, but only bother for array type data where the amounts
get large enough for indexes to matter.
As we now cache the statements, not reseting the statements kept most
of the database locked for the whole transactions, preventing concurrent
access.
Logic seems to suggest that we could then not bother reseting on fetch
from cache, but that causes failures so there's something fishy still.
sqlite_init() relies on db_dbenv pointing to NULL to properly initialize,
not sure if we can actually end up reusing the same handle but just in
case...
The order of file classification isn't interesting in itself, but arbitrary
order makes contents of RPMTAG_CLASSDICT non-deterministic which is not
nice for reproducable builds. Tell OMP to handle the class dictionary
in order.
Cancellation points are not allowed in ordered construct so we need to
drop that. It doesn't change the actual results, just means that we run
a little longer in case errors are encountered.
Fixes#934
We used to test against explicit digest values until commit
e20527ae07 changed the rpmkeys output
to drop the actual values and breaking the reproducability test - it
was now only testing whether the package we just built has intact
digests. Doh.
And because of that, commit fa303d5ba6 was
able to silently break setting buildtime from changelog (#932) and
why commit 4b15a9e48b didn't require
adjustment of the test-suite, and why addition of the alternative
payload digest in commit 83a26ae9e1 didn't
require changing this test. Maybe something else too. Doh.
Commit fa303d5ba6 moved buildhost and
buildtime calculation out of the package generation to early spec
initialization, but this broke reproducable builds: if buildtime is
to be set from changelog, changelog needs to be parsed first.
So either we need to do it twice or we need to do it right, and
besides avoiding duplication, conceptually these values are only
meaningful during a build and not a parse, so this restores that part
of the original code while keeping things thread-safe.
Fixes: #932
Commit e68eb68c4a introduced a regression
on Icon tag causing a crash on source rpm build, due to spec->numSources
being off by one if an icon was present.
A nicer fix would be eliminating numSources entirely but it's not as
easy as it should be due to dynamic buildrequires messing with it,
leaving that for another time.
Oops, all this time our most important build-dependency had been missing.
Add a version recommendation too - while rpm almost certainly works with
1.12 and 1.11 too, those are getting *really* long in the tooth, and 1.13
has an important type fix in poptGetOptArg() return value so might as
well use that as the base.
-P can appear multiple times so a string arg pointer is not the right
thing here in any case. There are other similar and related leaks all
over the codebase but this is especially insulting as the leaked pointer
was never used for anything at all.
Thanks for Peter Jones for pointing this out.
libsolv relies on header numbers not being recycled outside database
rebuild, ie the traditional behavior of BDB backend. And where there's
one, there's likely more users, we just don't know about them. Add
a test to ensure this behavior doesn't get accidentally changed.
Upstream sqlite defaults to secure_delete off, but its default value
is also a compile-time option which some distros (at least Fedora)
change to have it enabled by default. There's never any sensitive data
in the rpmdb, don't bother zeroing it out. Improves performance somewhat.
The initial implementation could recycle the topmost header numbers which
breaks libsolv caching as it assumes the traditional rpm behavior where
header numbers are not recycled.
Sqlite primary integer key with autoincrement behaves exactly the way rpm
database has traditionally allocated new "instance" numbers, and does not
seem to have any performance impact in this usage as there's always a
limited number of such increments anyway. Let it do the job instead.
Use sqlite3_table_column_metadata() to detect whether a table exists.
This will allow newly added rpm indexes to be automatically generated
and doesn't require running sql queries to figure out whether the
database was just created or not (and get it wrong because an empty
database would appear as newly created)
Previously, we assumed a backslash character would always be followed by
a character to be escaped, and advanced our "start" pointer by two
places before the next iteration. However, this assumption breaks if
the lonely backslash happens to be the last character in the query
string, in which case we would end up pointing beyond the \0 and let the
parser wander into the unknown, possibly crashing later.
This commit ensures we detect this corner case and error out gracefully
with a message.
POSIX.1-2008 marked ctime() as obsolete and recommends strftime() instead.
We get two flies on one stroke as ctime() is also classified "dangerous"
by LGTM due to not being thread-safe.
strftime(..., "%c"...) isn't exactly the same as ctime() but is consistent
with what we use elsewhere and sufficient for debug purposes anyway.
LGTM flags localtime() as a "dangerous" function, which seems a bit
over the top to me, but as we're flirting with threads, it certainly
is not thread-safe.