Update tuple.md

This commit is contained in:
Ben Collins 2017-09-19 22:41:55 +00:00 committed by GitHub Enterprise
parent f75dfc3153
commit 8c13f60625
1 changed files with 33 additions and 33 deletions

View File

@ -9,13 +9,13 @@ Status: Deprecated means that a previous layer used this type, but issues with t
### **Null Value**
Typecode: 0x00
Typecode: `0x00`
Length: 0 bytes
Status: Standard
### **Byte String**
Typecode: 0x01
Typecode: `0x01`
Length: Variable (terminated by` [\x00]![\xff]`)
Encoding: `b'\x01' + value.replace(b'\x00', b'\x00\xFF') + b'\x00'`
Test case: `pack(“foo\x00bar”) == b'\x01foo\x00\xffbar\x00'`
@ -25,25 +25,25 @@ In other words, byte strings are null terminated with null values occurring in t
### **Unicode String**
Typecode: 0x02
Length: Variable (terminated by [\x00]![\xff])
Typecode: `0x02`
Length: Variable (terminated by` [\x00]![\xff]`)
Encoding: `b'\x02' + value.encode('utf-8').replace(b'\x00', b'\x00\xFF') + b'\x00'`
Test case: `pack( u"F\u00d4O\u0000bar" ) == b'\x02F\xc3\x94O\x00\xffbar\x00'`
Status: Standard
This is the same way that byte strings are encoded, but first, the unicode string is encoded in UTF-8.
### **(DEBRECATED) Nested Tuple**
### **(DEPRECATED) Nested Tuple**
Typecodes: 0x03-0x04
Length: Variable (terminated by 0x04 type code)
Typecodes: `0x03` - `0x04`
Length: Variable (terminated by `0x04` type code)
Status: Deprecated
This encoding was used by a few layers. However, it had ordering problems when one tuple was a prefix of another and the type of the first element in the longer tuple was either null or a byte string. For an example, consider the empty tuple and the tuple containing only null. In the old scheme, the empty tuple would be encoded as `\x03\x04` while the tuple containing only null would be encoded as `\x03\x00\x04`, so the second tuple would sort first based on their bytes, which is incorrect semantically.
### **Nested Tuple**
Typecodes: 0x05
Typecodes: `0x05`
Length: Variable (terminated by `[\x00]![\xff]` at beginning of nested element)
Encoding: `b'\x05' + ''.join(map(lambda x: b'\x00\xff' if x is None else pack(x), value)) + b'\x00'`
Test case: `pack( (“foo\x00bar”, None, ()) ) == b'\x05\x01foo\x00\xffbar\x00\x00\xff\x05\x00\x00'`
@ -53,23 +53,23 @@ The list is ended with a 0x00 byte. Nulls within the tuple are encoded as `\x00\
### **Negative arbitrary-precision Integer**
Typecodes: 0x0a, 0x0b
Typecodes: `0x0a`, `0x0b`
Encoding: Not defined yet
Status: Reserved; 0x0b used in Python and Java
Status: Reserved; `0x0b` used in Python and Java
These typecodes are reserved for encoding integers larger than 8 bytes. Presumably the type code would be followed by some encoding of the length, followed by the big endian ones complement number. Reserving two typecodes for each of positive and negative numbers is probably overkill, but until theres a design in place we might as well not use them. In the Python and Java implementations, 0x0b stores negative numbers which are expressed with between 9 and 255 bytes. The first byte following the type code (0x0b) is a single byte expressing the number of bytes in the integer (with its bits flipped to preserve order), followed by that number of bytes representing the number in big endian order in one's complement.
These typecodes are reserved for encoding integers larger than 8 bytes. Presumably the type code would be followed by some encoding of the length, followed by the big endian ones complement number. Reserving two typecodes for each of positive and negative numbers is probably overkill, but until theres a design in place we might as well not use them. In the Python and Java implementations, `0x0b` stores negative numbers which are expressed with between 9 and 255 bytes. The first byte following the type code (`0x0b`) is a single byte expressing the number of bytes in the integer (with its bits flipped to preserve order), followed by that number of bytes representing the number in big endian order in one's complement.
### **Integer**
Typecodes: 0x0c - 0x1c
 0x0c is an 8 byte negative number
 0x13 is a 1 byte negative number
 0x14 is a zero
 0x15 is a 1 byte positive number
 0x1c is an 8 byte positive number
Typecodes: `0x0c` - `0x1c`
 `0x0c` is an 8 byte negative number
 `0x13` is a 1 byte negative number
 `0x14` is a zero
 `0x15` is a 1 byte positive number
 `0x1c` is an 8 byte positive number
Length: Depends on typecode (0-8 bytes)
Encoding: positive numbers are big endian
negative numbers are big endian ones complement (so -1 is 0x13 0xfe)
negative numbers are big endian ones complement (so -1 is `0x13` `0xfe`)
Test case: `pack( -5551212 ) == b'\x11\xabK\x93'`
Status: Standard
@ -77,18 +77,18 @@ There is some variation in the ability of language bindings to encode and decode
### **Positive arbitrary-precision Integer**
Typecodes: 0x1d, 0x1e
Typecodes: `0x1d`, `0x1e`
Encoding: Not defined yet
Status: Reserved; 0x1d used in Python and Java
These typecodes are reserved for encoding integers larger than 8 bytes. Presumably the type code would be followed by some encoding of the length, followed by the big endian ones complement number. Reserving two typecodes for each of positive and negative numbers is probably overkill, but until theres a design in place we might as well not use them. In the Python and Java implementations, 0x1d stores positive numbers which are expressed with between 9 and 255 bytes. The first byte following the type code (0x1d) is a single byte expressing the number of bytes in the integer, followed by that number of bytes representing the number in big endian order.
These typecodes are reserved for encoding integers larger than 8 bytes. Presumably the type code would be followed by some encoding of the length, followed by the big endian ones complement number. Reserving two typecodes for each of positive and negative numbers is probably overkill, but until theres a design in place we might as well not use them. In the Python and Java implementations, `0x1d` stores positive numbers which are expressed with between 9 and 255 bytes. The first byte following the type code (`0x1d`) is a single byte expressing the number of bytes in the integer, followed by that number of bytes representing the number in big endian order.
### **IEEE Binary Floating Point**
Typecodes:
 0x20 - float (32 bits)
 0x21 - double (64 bits)
 0x22 - long double (80 bits)
 `0x20` - float (32 bits)
 `0x21` - double (64 bits)
 `0x22` - long double (80 bits)
Length: 4 - 10 bytes
Test case: `pack( -42f ) == b'=\xd7\xff\xff'`
Encoding: Big-endian IEEE binary representation, followed by the following transformation:
@ -114,7 +114,7 @@ This should be equivalent to the standard IEEE total ordering.
### **Arbitrary-precision Decimal**
Typecodes: 0x23, 0x24
Typecodes: `0x23`, `0x24`
Length: Arbitrary
Encoding: Scale followed by arbitrary precision integer
Status: Reserved
@ -123,19 +123,19 @@ This encoding format has been used by layers. Note that this encoding makes almo
### **(DEPRECATED) True Value**
Typecode: 0x25
Typecode: `0x25`
Length: 0 bytes
Status: Deprecated
### **False Value**
Typecode: 0x26
Typecode: `0x26`
Length: 0 bytes
Status: Standard
### **True Value**
Typecode: 0x27
Typecode: `0x27`
Length: 0 bytes
Status: Standard
@ -143,7 +143,7 @@ Note that false will sort before true with the given encoding.
### **RFC 4122 UUID**
Typecode: 0x30
Typecode: `0x30`
Length: 16 bytes
Encoding: Network byte order as defined in the rfc: [_http://www.ietf.org/rfc/rfc4122.txt_](http://www.ietf.org/rfc/rfc4122.txt)
Status: Standard
@ -152,7 +152,7 @@ This is equivalent to the unsigned byte ordering of the UUID bytes in big-endian
### **64 bit identifier**
Typecode: 0x31
Typecode: `0x31`
Length: 8 bytes
Encoding: Big endian unsigned 8-byte integer (typically random or perhaps semi-sequential)
Status: Reserved
@ -161,14 +161,14 @@ Theres definitely some question of whether this deserves to be separated from
### **80 Bit versionstamp**
Typecode: 0x32
Typecode: `0x32`
Length: 10 bytes
Encoding: Big endian 10-byte integer. First/high 8 bytes are a database version, next two are batch version.
Status: Reserved
### **96 Bit Versionstamp**
Typecode: 0x33
Typecode: `0x33`
Length: 12 bytes
Encoding: Big endian 12-byte integer. First/high 8 bytes are a database version, next two are batch version, next two are ordering within transaction.
Status: Reserved
@ -177,7 +177,7 @@ The two versionstamp typecodes are reserved for future work adding compatibility
### **User type codes**
Typecode: 0x40 - 0x4f
Typecode: `0x40` - `0x4f`
Length: Variable (user defined)
Encoding: User defined
Status: Reserved
@ -188,7 +188,7 @@ The only way in which future official, otherwise backward-compatible versions of
### **Escape Character**
Typecode: 0xff
Typecode: `0xff`
Length: N/A
Encoding: N/A
Status: Reserved