In the almost ten years of rpm sort of supporting Python 3 bindings, quite
obviously nobody has actually tried to use them. There's a major mismatch
between what the header API outputs (bytes) and what all the other APIs
accept (strings), resulting in hysterical TypeErrors all over the place,
including but not limited to labelCompare() (RhBug:1631292). Also a huge
number of other places have been returning strings and silently assuming
utf-8 through use of Py_BuildValue("s", ...), which will just irrevocably
fail when non-utf8 data is encountered.
The politically Python 3-correct solution would be declaring all our data
as bytes with unspecified encoding - that's exactly what it historically is.
However doing so would by definition break every single rpm script people
have developed on Python 2. And when 99% of the rpm content in the world
actually is utf-8 encoded even if it doesn't say so (and in recent times
packages even advertise themselves as utf-8 encoded), the bytes-only route
seems a wee bit too draconian, even to this grumpy old fella.
Instead, route all our string returns through a single helper macro
which on Python 2 just does what we always did, but in Python 3 converts
the data to surrogate-escaped utf-8 strings. This makes stuff "just work"
out of the box pretty much everywhere even with Python 3 (including
our own test-suite!), while still allowing to handle the non-utf8 case.
Handling the non-utf8 case is a bit more uglier but still possible,
which is exactly how you want corner-cases to be. There might be some
uses for retrieving raw byte data from the header, but worrying about
such an API is a case for some other rainy day, for now we mostly only
care that stuff works again.
Also add test-cases for mixed data source labelCompare() and
non-utf8 insert to + retrieve from header.
This code was disabled in commit 9b94ae3dbc
about seven years ago before making a public appearance in any release.
That nobody has missed it in all this time tells me it's not that
necessary to have a python-level rpmtd object...
- We know the array size beforehand, allocate the entire array
at once and set the elements instead of appending one by one.
This is (an obvious) and well-measurable, if not a huge, win.
- Various functions in the Python bindings construct lists of objects, but
assume that all calls succeed. Each of these could segfault under
low-memory conditions: if the PyList_New() call fails,
PyList_Append(NULL, item ) will segfault. Similarly, although
Py_List_Append(list, NULL) is safe, Py_DECREF(NULL) will segfault.
Signed-off-by: Ales Kozumplik <akozumpl@redhat.com>
- Instead of masking and bitfiddling all over the place, use the
new getters to get the exact (enum) type directly. rpmTagGetType()
is now unused within rpm but leaving around for backwards compatibility
- In Python 2.6 PyBytes is just an alias for PyString, Python 3.0
removed PyString entirely
- Add compatibility defines for Python < 2.6
- Based on David Malcolm's Python 3.x efforts
The layout of PyVarObject changed between python 2 and python 3, and this leads
to the existing code for all of the various PyTypeObject initializers failing to
compile with python 3
Change the way we initialize these structs to use PyVarObject_HEAD_INIT directly,
rather than merely PyObject_HEAD_INIT, so that it compiles cleanly with both major
versions of Python
Python 2's various object structs use macros to implement common fields at the top of each
struct.
Python 3's objects instead embed a PyObject struct as the first member within the more refined
object structs.
Use the Py_TYPE() macro when accessing ob_type in order to encapsulate this difference.
- unlike other types, store the C-level td structure directly in the
python object, this lets us selectively expose some members directly,
avoids having to deal with rpmtd allocation separately and as leaves
the reference counting to python as rpmtd's aren't refcounted on C-level