Update tuple.md

This commit is contained in:
Ben Collins 2017-09-19 22:41:55 +00:00 committed by GitHub Enterprise
parent f75dfc3153
commit 8c13f60625
1 changed files with 33 additions and 33 deletions

View File

@ -9,13 +9,13 @@ Status: Deprecated means that a previous layer used this type, but issues with t
### **Null Value** ### **Null Value**
Typecode: 0x00 Typecode: `0x00`
Length: 0 bytes Length: 0 bytes
Status: Standard Status: Standard
### **Byte String** ### **Byte String**
Typecode: 0x01 Typecode: `0x01`
Length: Variable (terminated by` [\x00]![\xff]`) Length: Variable (terminated by` [\x00]![\xff]`)
Encoding: `b'\x01' + value.replace(b'\x00', b'\x00\xFF') + b'\x00'` Encoding: `b'\x01' + value.replace(b'\x00', b'\x00\xFF') + b'\x00'`
Test case: `pack(“foo\x00bar”) == b'\x01foo\x00\xffbar\x00'` Test case: `pack(“foo\x00bar”) == b'\x01foo\x00\xffbar\x00'`
@ -25,25 +25,25 @@ In other words, byte strings are null terminated with null values occurring in t
### **Unicode String** ### **Unicode String**
Typecode: 0x02 Typecode: `0x02`
Length: Variable (terminated by [\x00]![\xff]) Length: Variable (terminated by` [\x00]![\xff]`)
Encoding: `b'\x02' + value.encode('utf-8').replace(b'\x00', b'\x00\xFF') + b'\x00'` Encoding: `b'\x02' + value.encode('utf-8').replace(b'\x00', b'\x00\xFF') + b'\x00'`
Test case: `pack( u"F\u00d4O\u0000bar" ) == b'\x02F\xc3\x94O\x00\xffbar\x00'` Test case: `pack( u"F\u00d4O\u0000bar" ) == b'\x02F\xc3\x94O\x00\xffbar\x00'`
Status: Standard Status: Standard
This is the same way that byte strings are encoded, but first, the unicode string is encoded in UTF-8. This is the same way that byte strings are encoded, but first, the unicode string is encoded in UTF-8.
### **(DEBRECATED) Nested Tuple** ### **(DEPRECATED) Nested Tuple**
Typecodes: 0x03-0x04 Typecodes: `0x03` - `0x04`
Length: Variable (terminated by 0x04 type code) Length: Variable (terminated by `0x04` type code)
Status: Deprecated Status: Deprecated
This encoding was used by a few layers. However, it had ordering problems when one tuple was a prefix of another and the type of the first element in the longer tuple was either null or a byte string. For an example, consider the empty tuple and the tuple containing only null. In the old scheme, the empty tuple would be encoded as `\x03\x04` while the tuple containing only null would be encoded as `\x03\x00\x04`, so the second tuple would sort first based on their bytes, which is incorrect semantically. This encoding was used by a few layers. However, it had ordering problems when one tuple was a prefix of another and the type of the first element in the longer tuple was either null or a byte string. For an example, consider the empty tuple and the tuple containing only null. In the old scheme, the empty tuple would be encoded as `\x03\x04` while the tuple containing only null would be encoded as `\x03\x00\x04`, so the second tuple would sort first based on their bytes, which is incorrect semantically.
### **Nested Tuple** ### **Nested Tuple**
Typecodes: 0x05 Typecodes: `0x05`
Length: Variable (terminated by `[\x00]![\xff]` at beginning of nested element) Length: Variable (terminated by `[\x00]![\xff]` at beginning of nested element)
Encoding: `b'\x05' + ''.join(map(lambda x: b'\x00\xff' if x is None else pack(x), value)) + b'\x00'` Encoding: `b'\x05' + ''.join(map(lambda x: b'\x00\xff' if x is None else pack(x), value)) + b'\x00'`
Test case: `pack( (“foo\x00bar”, None, ()) ) == b'\x05\x01foo\x00\xffbar\x00\x00\xff\x05\x00\x00'` Test case: `pack( (“foo\x00bar”, None, ()) ) == b'\x05\x01foo\x00\xffbar\x00\x00\xff\x05\x00\x00'`
@ -53,23 +53,23 @@ The list is ended with a 0x00 byte. Nulls within the tuple are encoded as `\x00\
### **Negative arbitrary-precision Integer** ### **Negative arbitrary-precision Integer**
Typecodes: 0x0a, 0x0b Typecodes: `0x0a`, `0x0b`
Encoding: Not defined yet Encoding: Not defined yet
Status: Reserved; 0x0b used in Python and Java Status: Reserved; `0x0b` used in Python and Java
These typecodes are reserved for encoding integers larger than 8 bytes. Presumably the type code would be followed by some encoding of the length, followed by the big endian ones complement number. Reserving two typecodes for each of positive and negative numbers is probably overkill, but until theres a design in place we might as well not use them. In the Python and Java implementations, 0x0b stores negative numbers which are expressed with between 9 and 255 bytes. The first byte following the type code (0x0b) is a single byte expressing the number of bytes in the integer (with its bits flipped to preserve order), followed by that number of bytes representing the number in big endian order in one's complement. These typecodes are reserved for encoding integers larger than 8 bytes. Presumably the type code would be followed by some encoding of the length, followed by the big endian ones complement number. Reserving two typecodes for each of positive and negative numbers is probably overkill, but until theres a design in place we might as well not use them. In the Python and Java implementations, `0x0b` stores negative numbers which are expressed with between 9 and 255 bytes. The first byte following the type code (`0x0b`) is a single byte expressing the number of bytes in the integer (with its bits flipped to preserve order), followed by that number of bytes representing the number in big endian order in one's complement.
### **Integer** ### **Integer**
Typecodes: 0x0c - 0x1c Typecodes: `0x0c` - `0x1c`
 0x0c is an 8 byte negative number  `0x0c` is an 8 byte negative number
 0x13 is a 1 byte negative number  `0x13` is a 1 byte negative number
 0x14 is a zero  `0x14` is a zero
 0x15 is a 1 byte positive number  `0x15` is a 1 byte positive number
 0x1c is an 8 byte positive number  `0x1c` is an 8 byte positive number
Length: Depends on typecode (0-8 bytes) Length: Depends on typecode (0-8 bytes)
Encoding: positive numbers are big endian Encoding: positive numbers are big endian
negative numbers are big endian ones complement (so -1 is 0x13 0xfe) negative numbers are big endian ones complement (so -1 is `0x13` `0xfe`)
Test case: `pack( -5551212 ) == b'\x11\xabK\x93'` Test case: `pack( -5551212 ) == b'\x11\xabK\x93'`
Status: Standard Status: Standard
@ -77,18 +77,18 @@ There is some variation in the ability of language bindings to encode and decode
### **Positive arbitrary-precision Integer** ### **Positive arbitrary-precision Integer**
Typecodes: 0x1d, 0x1e Typecodes: `0x1d`, `0x1e`
Encoding: Not defined yet Encoding: Not defined yet
Status: Reserved; 0x1d used in Python and Java Status: Reserved; 0x1d used in Python and Java
These typecodes are reserved for encoding integers larger than 8 bytes. Presumably the type code would be followed by some encoding of the length, followed by the big endian ones complement number. Reserving two typecodes for each of positive and negative numbers is probably overkill, but until theres a design in place we might as well not use them. In the Python and Java implementations, 0x1d stores positive numbers which are expressed with between 9 and 255 bytes. The first byte following the type code (0x1d) is a single byte expressing the number of bytes in the integer, followed by that number of bytes representing the number in big endian order. These typecodes are reserved for encoding integers larger than 8 bytes. Presumably the type code would be followed by some encoding of the length, followed by the big endian ones complement number. Reserving two typecodes for each of positive and negative numbers is probably overkill, but until theres a design in place we might as well not use them. In the Python and Java implementations, `0x1d` stores positive numbers which are expressed with between 9 and 255 bytes. The first byte following the type code (`0x1d`) is a single byte expressing the number of bytes in the integer, followed by that number of bytes representing the number in big endian order.
### **IEEE Binary Floating Point** ### **IEEE Binary Floating Point**
Typecodes: Typecodes:
 0x20 - float (32 bits)  `0x20` - float (32 bits)
 0x21 - double (64 bits)  `0x21` - double (64 bits)
 0x22 - long double (80 bits)  `0x22` - long double (80 bits)
Length: 4 - 10 bytes Length: 4 - 10 bytes
Test case: `pack( -42f ) == b'=\xd7\xff\xff'` Test case: `pack( -42f ) == b'=\xd7\xff\xff'`
Encoding: Big-endian IEEE binary representation, followed by the following transformation: Encoding: Big-endian IEEE binary representation, followed by the following transformation:
@ -114,7 +114,7 @@ This should be equivalent to the standard IEEE total ordering.
### **Arbitrary-precision Decimal** ### **Arbitrary-precision Decimal**
Typecodes: 0x23, 0x24 Typecodes: `0x23`, `0x24`
Length: Arbitrary Length: Arbitrary
Encoding: Scale followed by arbitrary precision integer Encoding: Scale followed by arbitrary precision integer
Status: Reserved Status: Reserved
@ -123,19 +123,19 @@ This encoding format has been used by layers. Note that this encoding makes almo
### **(DEPRECATED) True Value** ### **(DEPRECATED) True Value**
Typecode: 0x25 Typecode: `0x25`
Length: 0 bytes Length: 0 bytes
Status: Deprecated Status: Deprecated
### **False Value** ### **False Value**
Typecode: 0x26 Typecode: `0x26`
Length: 0 bytes Length: 0 bytes
Status: Standard Status: Standard
### **True Value** ### **True Value**
Typecode: 0x27 Typecode: `0x27`
Length: 0 bytes Length: 0 bytes
Status: Standard Status: Standard
@ -143,7 +143,7 @@ Note that false will sort before true with the given encoding.
### **RFC 4122 UUID** ### **RFC 4122 UUID**
Typecode: 0x30 Typecode: `0x30`
Length: 16 bytes Length: 16 bytes
Encoding: Network byte order as defined in the rfc: [_http://www.ietf.org/rfc/rfc4122.txt_](http://www.ietf.org/rfc/rfc4122.txt) Encoding: Network byte order as defined in the rfc: [_http://www.ietf.org/rfc/rfc4122.txt_](http://www.ietf.org/rfc/rfc4122.txt)
Status: Standard Status: Standard
@ -152,7 +152,7 @@ This is equivalent to the unsigned byte ordering of the UUID bytes in big-endian
### **64 bit identifier** ### **64 bit identifier**
Typecode: 0x31 Typecode: `0x31`
Length: 8 bytes Length: 8 bytes
Encoding: Big endian unsigned 8-byte integer (typically random or perhaps semi-sequential) Encoding: Big endian unsigned 8-byte integer (typically random or perhaps semi-sequential)
Status: Reserved Status: Reserved
@ -161,14 +161,14 @@ Theres definitely some question of whether this deserves to be separated from
### **80 Bit versionstamp** ### **80 Bit versionstamp**
Typecode: 0x32 Typecode: `0x32`
Length: 10 bytes Length: 10 bytes
Encoding: Big endian 10-byte integer. First/high 8 bytes are a database version, next two are batch version. Encoding: Big endian 10-byte integer. First/high 8 bytes are a database version, next two are batch version.
Status: Reserved Status: Reserved
### **96 Bit Versionstamp** ### **96 Bit Versionstamp**
Typecode: 0x33 Typecode: `0x33`
Length: 12 bytes Length: 12 bytes
Encoding: Big endian 12-byte integer. First/high 8 bytes are a database version, next two are batch version, next two are ordering within transaction. Encoding: Big endian 12-byte integer. First/high 8 bytes are a database version, next two are batch version, next two are ordering within transaction.
Status: Reserved Status: Reserved
@ -177,7 +177,7 @@ The two versionstamp typecodes are reserved for future work adding compatibility
### **User type codes** ### **User type codes**
Typecode: 0x40 - 0x4f Typecode: `0x40` - `0x4f`
Length: Variable (user defined) Length: Variable (user defined)
Encoding: User defined Encoding: User defined
Status: Reserved Status: Reserved
@ -188,7 +188,7 @@ The only way in which future official, otherwise backward-compatible versions of
### **Escape Character** ### **Escape Character**
Typecode: 0xff Typecode: `0xff`
Length: N/A Length: N/A
Encoding: N/A Encoding: N/A
Status: Reserved Status: Reserved