Commit Graph

14864 Commits

Author SHA1 Message Date
Gordon Messmer 3f58fc31f7 scripts/pythondistdeps: Only print rich dep list when required to. 2019-12-20 21:20:42 +01:00
Panu Matilainen 4a71a3eccd Silence spurious error message from lsetfilecon() on -EOPNOTSUPP
We already filter out -EOPNOTSUPP and return OK, but the message was
getting logged before the filtering so we'd spit out spurious error
messages on filesystems that don't support SELinux (RhBug:1777502)
2019-12-13 13:49:03 +01:00
Gordon Messmer 82971c6dea scripts/pythondistdeps: Only add setuptools requirement for egg-info packages. 2019-12-12 10:12:57 +01:00
Panu Matilainen 0094eaaf65 Avoid pulling unused data from the database in sqlite backend
We don't need to pull the key values from the db here, we already have it.
2019-12-11 11:28:02 +02:00
Panu Matilainen f964889b13 Don't bother sorting dbi set from the backend
The upper layers sort the dbi set as necessary, this is nothing but
waste of perfectly good computing cycles.
2019-12-11 11:28:02 +02:00
Panu Matilainen 10f0fe221e Use a sub-cursor for sqlite index key iteration
Spinning new cursors for each key value is just mind-bogglingly stupid
(if simple), use a sub-cursor with key binding instead. Which isn't any
more complicated, really.
2019-12-11 11:28:02 +02:00
Panu Matilainen d402c202ee Trash the stupid sqlite statement cache
Caching statements is not stupid but when it makes us go slower...
Also without the cache we have a better handle on locks and all,
revisit the caching issue later.
2019-12-11 11:28:02 +02:00
Panu Matilainen 25932e98bb Never reuse cursors for different purposes in sqlite
Rpm can reuse an iterator cursor for a write when rewriting an existing
header (such as when updating file states of already installed package),
which in sqlite terms turns a SELECT into INSERT, destroying the
iteration in progress.

Detect the condition and use a local cursor as necessary, this also means
we don't need to track the query fmt string in cursors.
2019-12-11 11:28:02 +02:00
Panu Matilainen e769796d6a Simplify sqlite cursor preparation logic
No functional changes, just makes the thing a little clearer.
2019-12-11 11:28:02 +02:00
Panu Matilainen 0f0fbead08 Use the new index key iterator for dependency check hashes
In normal transactions this is but a drop in the ocean. However on
rpm -Va, the hashes get rebuilt from scratch on every single package
which on my moderate rpmdb (~2800 packages) testcase results in
103418347 data values fetched and added to dbi sets that are only
thrown away.

With bdb and lmdb this is only a minor optimization but for ndb and sqlite
which can retrieve keys independently, this is a much bigger win. In case
of sqlite, it's a massive one.
2019-12-09 11:39:48 +02:00
Panu Matilainen 4eb7900d54 Implement a key-only rpmdb index iterator
The regular index iterator grabs the associated data too, which we
don't always need. The data associated with indexes is relatively
lightweight, but as with everything, it adds up if in the millions scale.

Update all backends to allow for NULL set in the index retrieve to
signal key-only retrieval. Ndb actually had an earlier, abandoned
implementation of the same idea under slightly different API, lets
reuse the code-block.
2019-12-09 11:39:48 +02:00
Ross Burton 61ea5a8ea6 rpmio: initialise libgcrypt
If we're using libgcrypt for hashing we need to initialise libgcrypt as
otherwise it is not thread-safe.  Without this it will crash when used
in parallel packaging runs.

Fixes #968
2019-12-05 12:53:29 +02:00
Florian Festi a9e01be677 find-debuginfo.sh: Dont use xargs -d for portability
Use -0 instead which is more widly available.

Related: #948
2019-12-05 12:52:17 +02:00
Florian Festi a9b9fcbd46 brp-strip-static-archive: Do not use xargs -d
as it is not POSIX compliant and not supported on BSD and others

Dropping the use of grep here and using -regex of find instead to get rid of
the need for looking at lines terminated by newline.

Using only basic regular expressions for portability.

Resolves: #948
2019-11-29 08:46:48 +02:00
Gordon Messmer 18a3a046d0 scripts/pythondistdeps: Handle version ending with ".*" 2019-11-29 08:36:44 +02:00
Gordon Messmer ed864b8842 scripts/pythondistdeps: Handle compatible-release operator. 2019-11-29 08:36:44 +02:00
Gordon Messmer 403f75cdbb scripts/pythondistdeps: Use rich deps for semantically versioned dependencies 2019-11-29 08:36:44 +02:00
Gordon Messmer 54b5004cda scripts/pythondistdeps: Match python version if minor has multiple digits. 2019-11-29 08:36:44 +02:00
Panu Matilainen 19e375880c Optimize sqlite database rebuild
Postpone sql index creation until closing time on database rebuilds,
this is more than 50% improvement on rebuilddb time on both SSD and
traditional disks.
2019-11-25 10:07:05 +02:00
Panu Matilainen 6b18e76f3d Fix resource leaks on zstd open error paths
If zstd stream initialization fails, the opened fd and the stream
itself are leaked. Handle error exit in a central label.
2019-11-22 09:39:00 +02:00
Panu Matilainen b55fef0903 Generate file class dictionary after file classification for stability
Store the file type strings in the classifier, and generate the dictionary
and its ids serially after the parallel section completes to ensure
stable order. Besides making the classifying really run in parallel
again, this also moves the pool- and file-counting related constraints
out of the parallel section for theoretically better parallelization.

Fixes #934
2019-11-22 09:38:40 +02:00
Panu Matilainen 3163019e60 Revert "Preserve processing order in file classification"
The order directive might be useful in some cases, but for our purposes
it very effectively serializes the whole classification operation.
Which means that we get the speed of serial classification with the
complexity of parallel execution, ugh. Revert, we need a better fix.

This reverts commit 3691d99c8b.
2019-11-22 09:38:40 +02:00
Panu Matilainen 9c8a3faf42 Fix sqlite database access for unprivileged readers in WAL mode
As of sqlite 3.22.0, a database in WAL mode can be opened readonly
if one or more of the following is true:

1) The -shm and -wal files already exists and are readable
2) There is write permission on the directory containing the database
   so that the -shm and -wal files can be created.
3) The database connection is opened using the immutable query parameter.

Regular users running queries cannot have permission to create, and
immutable databases can only exist on readonly media (because all locking
is disabled) so there's no choice but to leave the -shm and -wal files
around at all times, a little ugly as it might be.
2019-11-21 09:36:50 +02:00
Panu Matilainen f5e6a4815a Fix sqlite db open logic on errors, hopefully
Handle read-only database separately for table and index initialization
as they are different: if table cannot be created we're screwed and
should return an error, but if an index is missing it'll still work.

Also only return an allocated dbi if everything went okay.
2019-11-21 09:36:50 +02:00
Panu Matilainen 6e91d914c5 Ask sqlite if it's in readonly mode instead of educated guess 2019-11-21 09:36:50 +02:00
Ross Burton 22ed98efe3 configure.ac: prefer pkg-config to find libgcrypt
libgcrypt from 1.8.5 provides a pkg-config file as well as the traditional
libgcrypt-config script.  As pkg-config is more resiliant in the face of
complicated build environments (for example cross-compilation and sysroots)
prefer the pkg-config file, falling back to libgcrypt-config if that doesn't
exist.
2019-11-21 09:36:28 +02:00
Panu Matilainen d320a1c67c Tie sqlite WAL auto-checkpointing to %_flush_io macro setting
While flushing IO constantly is horrible for performance on spinning
disks, it's beneficial on SSDs. So if %_flush_io is enabled, let sqlite
do its regular WAL auto-checkpointing. Pure DB install time on SSD
improves by some 25% here with %_flush_io, at cost of removal speed but
for overall sanity and smaller WAL file.
2019-11-15 08:09:59 +02:00
Panu Matilainen a8588f8e39 Switch to write-ahead-logging (WAL) mode on sqlite backend
This is an enormous performance boost to almost all operation modes,
in particular spinning disks - on my old laptop, time to erase ~3300
packages comes down from half an hour to under three minutes.

By default sqlite likes to keep the WAL file relatively slow to keep
reader performance up, but as writes are rare in rpm, we let the WAL
grow unlimited during transactions and only flush it at database
close. This way transactions are fast and at the cost of read
performance during transactions, which is perfectly okay. To do this
we need to move statement cache free earlier to drop any locks that
might be lurking there.

Special handling of RPMDB_FLAG_REBUILD is dropped here as unlike other
modes, WAL is persistent and has interactions with EXCLUSIVE locking_mode
among other things that seem non-trivial to sort out. This has some
performance cost to --rebuilddb but it's a rare operation and normal
transaction performance is far more important. We can always tweak
it more later on...
2019-11-15 08:09:59 +02:00
Panu Matilainen 61e13a608a Oops, the sqlite default "synchronous" mode is FULL, not NORMAL
This would've caused inconsistent behavior between databases where
fsync is explicitly set to enabled and to one where it was not set.
2019-11-15 08:09:59 +02:00
Panu Matilainen db1efd0828 Sqlite index tweaks for performance
Only create key indexes for strings, those are the ones that are
performance critical and anything else is just unnecessary weight.
Additionally create indexes on hnum for better performance especially
on erasure, but only bother for array type data where the amounts
get large enough for indexes to matter.
2019-11-15 08:09:59 +02:00
Panu Matilainen 7d1c644756 Reset sqlite statements at cursor close to free locks on database
As we now cache the statements, not reseting the statements kept most
of the database locked for the whole transactions, preventing concurrent
access.

Logic seems to suggest that we could then not bother reseting on fetch
from cache, but that causes failures so there's something fishy still.
2019-11-15 08:09:59 +02:00
Panu Matilainen e80f13764b Set sqlite db handle to NULL on db close to ensure no dangling pointers
sqlite_init() relies on db_dbenv pointing to NULL to properly initialize,
not sure if we can actually end up reusing the same handle but just in
case...
2019-11-15 08:09:59 +02:00
Panu Matilainen bea298986d Force case-sensitive LIKE operator in sqlite
We use LIKE for key prefix searches (file triggers) and case certainly
matters there.
2019-11-15 08:09:59 +02:00
Panu Matilainen 3691d99c8b Preserve processing order in file classification
The order of file classification isn't interesting in itself, but arbitrary
order makes contents of RPMTAG_CLASSDICT non-deterministic which is not
nice for reproducable builds. Tell OMP to handle the class dictionary
in order.

Cancellation points are not allowed in ordered construct so we need to
drop that. It doesn't change the actual results, just means that we  run
a little longer in case errors are encountered.

Fixes #934
2019-11-12 09:47:53 +02:00
Panu Matilainen 7cb8ebdf92 Resurrect our reproducable build test
We used to test against explicit digest values until commit
e20527ae07 changed the rpmkeys output
to drop the actual values and breaking the reproducability test - it
was now only testing whether the package we just built has intact
digests. Doh.

And because of that, commit fa303d5ba6 was
able to silently break setting buildtime from changelog (#932) and
why commit 4b15a9e48b didn't require
adjustment of the test-suite, and why addition of the alternative
payload digest in commit 83a26ae9e1 didn't
require changing this test. Maybe something else too. Doh.
2019-11-11 15:11:27 +02:00
Panu Matilainen d16b082354 Only calculate buildhost and buildtime during an actual build
Commit fa303d5ba6 moved buildhost and
buildtime calculation out of the package generation to early spec
initialization, but this broke reproducable builds: if buildtime is
to be set from changelog, changelog needs to be parsed first.

So either we need to do it twice or we need to do it right, and
besides avoiding duplication, conceptually these values are only
meaningful during a build and not a parse, so this restores that part
of the original code while keeping things thread-safe.

Fixes: #932
2019-11-11 15:11:27 +02:00
Panu Matilainen 2bef1d463e Fix a memleak from uncompressed payload digest calculation
Commit 83a26ae9e1 forgot to free the
new digest, duh.
2019-11-08 12:45:38 +02:00
Panu Matilainen 4e1fe6af0b Fix Icon tag crash on src.rpm build, regression in 4.15.0 (RhBug:1769579)
Commit e68eb68c4a introduced a regression
on Icon tag causing a crash on source rpm build, due to spec->numSources
being off by one if an icon was present.

A nicer fix would be eliminating numSources entirely but it's not as
easy as it should be due to dynamic buildrequires messing with it,
leaving that for another time.
2019-11-08 12:45:38 +02:00
Panu Matilainen 2c57944691 Document popt build-requirement and point a download location
Oops, all this time our most important build-dependency had been missing.
Add a version recommendation too - while rpm almost certainly works with
1.12 and 1.11 too, those are getting *really* long in the tooth, and 1.13
has an important type fix in poptGetOptArg() return value so might as
well use that as the base.
2019-11-08 10:50:02 +02:00
Panu Matilainen af73aba20f Fix ancient memleak on %setup arguments
poptGetOptArg() returns malloced strings and caller needs to free.
This will still leak on error paths but at least normal use is covered.
2019-11-08 10:50:02 +02:00
Panu Matilainen 29d70efb1d Fix ancient memleak on %patch -P from unused popt arg pointer
-P can appear multiple times so a string arg pointer is not the right
thing here in any case. There are other similar and related leaks all
over the codebase but this is especially insulting as the leaked pointer
was never used for anything at all.

Thanks for Peter Jones for pointing this out.
2019-11-08 10:50:02 +02:00
Panu Matilainen 97a873aff5 Fix memleak on %generate_buildrequires usage
Leak introduced in commit 58dcfddc37
2019-11-08 10:50:02 +02:00
Panu Matilainen 6cf5636a35 Add test to ensure strictly monotonically increasing header numbering
libsolv relies on header numbers not being recycled outside database
rebuild, ie the traditional behavior of BDB backend. And where there's
one, there's likely more users, we just don't know about them. Add
a test to ensure this behavior doesn't get accidentally changed.
2019-11-07 13:43:03 +02:00
Panu Matilainen a39824fc3a Add an error callback to the sqlite backend
Normally no output is expected there, but will help catch out abuses
and other surprises.
2019-11-07 13:43:03 +02:00
Panu Matilainen ad329e600a Ensure sqlite secure_delete behavior is disabled
Upstream sqlite defaults to secure_delete off, but its default value
is also a compile-time option which some distros (at least Fedora)
change to have it enabled by default. There's never any sensitive data
in the rpmdb, don't bother zeroing it out. Improves performance somewhat.
2019-11-07 13:43:03 +02:00
Panu Matilainen cf0e94cc8e Let sqlite handle header number allocation, don't recycle header numbers
The initial implementation could recycle the topmost header numbers which
breaks libsolv caching as it assumes the traditional rpm behavior where
header numbers are not recycled.

Sqlite primary integer key with autoincrement behaves exactly the way rpm
database has traditionally allocated new "instance" numbers, and does not
seem to have any performance impact in this usage as there's always a
limited number of such increments anyway. Let it do the job instead.
2019-11-07 13:43:03 +02:00
Panu Matilainen e5ba3247d8 Handle DBI_CREATED flag correctly for all tables in sqlite
Use sqlite3_table_column_metadata() to detect whether a table exists.
This will allow newly added rpm indexes to be automatically generated
and doesn't require running sql queries to figure out whether the
database was just created or not (and get it wrong because an empty
database would appear as newly created)
2019-11-07 13:43:03 +02:00
Michal Domonkos 1cb3be0009 Handle incomplete escape seq in queryformat (RhBug:1755230)
Previously, we assumed a backslash character would always be followed by
a character to be escaped, and advanced our "start" pointer by two
places before the next iteration.  However, this assumption breaks if
the lonely backslash happens to be the last character in the query
string, in which case we would end up pointing beyond the \0 and let the
parser wander into the unknown, possibly crashing later.

This commit ensures we detect this corner case and error out gracefully
with a message.
2019-11-04 12:49:43 +02:00
Panu Matilainen 998b516b8b Replace use of obsolete ctime() with strftime()
POSIX.1-2008 marked ctime() as obsolete and recommends strftime() instead.
We get two flies on one stroke as ctime() is also classified "dangerous"
by LGTM due to not being thread-safe.

strftime(..., "%c"...) isn't exactly the same as ctime() but is consistent
with what we use elsewhere and sufficient for debug purposes anyway.
2019-10-31 11:59:20 +01:00
Panu Matilainen 9fe75561f9 Replace uses of localtime() with the re-entrant variant
LGTM flags localtime() as a "dangerous" function, which seems a bit
over the top to me, but as we're flirting with threads, it certainly
is not thread-safe.
2019-10-31 11:59:20 +01:00