* 👥 update contributor and sponsor list * 🚧 document BJData format * 🚧 document BJData format * 📝 clarified documentation of [json.exception.parse_error.112] * ✏️ adjust titles * 📝 add more examples * 🚨 adjust warnings for index.md files * 📝 add more examples * 🔥 remove example for deprecated code * 📝 add missing enum entry * 📝 overwork table for binary formats * ✅ add test to create table for binary formats * 📝 fix wording in example * 📝 add more examples * Update iterators.md (#3481) * ✨ add check for overloads to linter #3455 * 👥 update contributor list * 📝 add more examples * 📝 fix documentation * 📝 add more examples * 🎨 fix indentation * 🔥 remove example for destructor * 📝 overwork documentation * Updated BJData documentation, #3464 (#3493) * update bjdata.md for #3464 * Minor edit * Fix URL typo * Add info on demoting ND array to a 1-D optimized array when singleton dimension Co-authored-by: Chaoqi Zhang <prncoprs@163.com> Co-authored-by: Qianqian Fang <fangqq@gmail.com>
373 lines
12 KiB
Markdown
373 lines
12 KiB
Markdown
# Binary Values
|
|
|
|
The library implements several [binary formats](binary_formats/index.md) that encode JSON in an efficient way. Most of
|
|
these formats support binary values; that is, values that have semantics define outside the library and only define a
|
|
sequence of bytes to be stored.
|
|
|
|
JSON itself does not have a binary value. As such, binary values are an extension that this library implements to store
|
|
values received by a binary format. Binary values are never created by the JSON parser, and are only part of a
|
|
serialized JSON text if they have been created manually or via a binary format.
|
|
|
|
## API for binary values
|
|
|
|
```plantuml
|
|
class json::binary_t {
|
|
-- setters --
|
|
+void set_subtype(std::uint64_t subtype)
|
|
+void clear_subtype()
|
|
-- getters --
|
|
+std::uint64_t subtype() const
|
|
+bool has_subtype() const
|
|
}
|
|
|
|
"std::vector<uint8_t>" <|-- json::binary_t
|
|
```
|
|
|
|
By default, binary values are stored as `std::vector<std::uint8_t>`. This type can be changed by providing a template
|
|
parameter to the `basic_json` type. To store binary subtypes, the storage type is extended and exposed as
|
|
`json::binary_t`:
|
|
|
|
```cpp
|
|
auto binary = json::binary_t({0xCA, 0xFE, 0xBA, 0xBE});
|
|
auto binary_with_subtype = json::binary_t({0xCA, 0xFE, 0xBA, 0xBE}, 42);
|
|
```
|
|
|
|
There are several convenience functions to check and set the subtype:
|
|
|
|
```cpp
|
|
binary.has_subtype(); // returns false
|
|
binary_with_subtype.has_subtype(); // returns true
|
|
|
|
binary_with_subtype.clear_subtype();
|
|
binary_with_subtype.has_subtype(); // returns true
|
|
|
|
binary_with_subtype.set_subtype(42);
|
|
binary.set_subtype(23);
|
|
|
|
binary.subtype(); // returns 23
|
|
```
|
|
|
|
As `json::binary_t` is subclassing `std::vector<std::uint8_t>`, all member functions are available:
|
|
|
|
```cpp
|
|
binary.size(); // returns 4
|
|
binary[1]; // returns 0xFE
|
|
```
|
|
|
|
JSON values can be constructed from `json::binary_t`:
|
|
|
|
```cpp
|
|
json j = binary;
|
|
```
|
|
|
|
Binary values are primitive values just like numbers or strings:
|
|
|
|
```cpp
|
|
j.is_binary(); // returns true
|
|
j.is_primitive(); // returns true
|
|
```
|
|
|
|
Given a binary JSON value, the `binary_t` can be accessed by reference as via `get_binary()`:
|
|
|
|
```cpp
|
|
j.get_binary().has_subtype(); // returns true
|
|
j.get_binary().size(); // returns 4
|
|
```
|
|
|
|
For convenience, binary JSON values can be constructed via `json::binary`:
|
|
|
|
```cpp
|
|
auto j2 = json::binary({0xCA, 0xFE, 0xBA, 0xBE}, 23);
|
|
auto j3 = json::binary({0xCA, 0xFE, 0xBA, 0xBE});
|
|
|
|
j2 == j; // returns true
|
|
j3.get_binary().has_subtype(); // returns false
|
|
j3.get_binary().subtype(); // returns std::uint64_t(-1) as j3 has no subtype
|
|
```
|
|
|
|
|
|
|
|
## Serialization
|
|
|
|
Binary values are serialized differently according to the formats.
|
|
|
|
### JSON
|
|
|
|
JSON does not have a binary type, and this library does not introduce a new type as this would break conformance.
|
|
Instead, binary values are serialized as an object with two keys: `bytes` holds an array of integers, and `subtype`
|
|
is an integer or `null`.
|
|
|
|
??? example
|
|
|
|
Code:
|
|
|
|
```cpp
|
|
// create a binary value of subtype 42
|
|
json j;
|
|
j["binary"] = json::binary({0xCA, 0xFE, 0xBA, 0xBE}, 42);
|
|
|
|
// serialize to standard output
|
|
std::cout << j.dump(2) << std::endl;
|
|
```
|
|
|
|
Output:
|
|
|
|
```json
|
|
{
|
|
"binary": {
|
|
"bytes": [202, 254, 186, 190],
|
|
"subtype": 42
|
|
}
|
|
}
|
|
```
|
|
|
|
!!! warning "No roundtrip for binary values"
|
|
|
|
The JSON parser will not parse the objects generated by binary values back to binary values. This is by design to
|
|
remain standards compliant. Serializing binary values to JSON is only implemented for debugging purposes.
|
|
|
|
### BJData
|
|
|
|
[BJData](binary_formats/bjdata.md) neither supports binary values nor subtypes, and proposes to serialize binary values
|
|
as array of uint8 values. This translation is implemented by the library.
|
|
|
|
??? example
|
|
|
|
Code:
|
|
|
|
```cpp
|
|
// create a binary value of subtype 42 (will be ignored in BJData)
|
|
json j;
|
|
j["binary"] = json::binary({0xCA, 0xFE, 0xBA, 0xBE}, 42);
|
|
|
|
// convert to BJData
|
|
auto v = json::to_bjdata(j);
|
|
```
|
|
|
|
`v` is a `std::vector<std::uint8t>` with the following 20 elements:
|
|
|
|
```c
|
|
0x7B // '{'
|
|
0x69 0x06 // i 6 (length of the key)
|
|
0x62 0x69 0x6E 0x61 0x72 0x79 // "binary"
|
|
0x5B // '['
|
|
0x55 0xCA 0x55 0xFE 0x55 0xBA 0x55 0xBE // content (each byte prefixed with 'U')
|
|
0x5D // ']'
|
|
0x7D // '}'
|
|
```
|
|
|
|
The following code uses the type and size optimization for UBJSON:
|
|
|
|
```cpp
|
|
// convert to UBJSON using the size and type optimization
|
|
auto v = json::to_bjdata(j, true, true);
|
|
```
|
|
|
|
The resulting vector has 22 elements; the optimization is not effective for examples with few values:
|
|
|
|
```c
|
|
0x7B // '{'
|
|
0x23 0x69 0x01 // '#' 'i' type of the array elements: unsigned integers
|
|
0x69 0x06 // i 6 (length of the key)
|
|
0x62 0x69 0x6E 0x61 0x72 0x79 // "binary"
|
|
0x5B // '[' array
|
|
0x24 0x55 // '$' 'U' type of the array elements: unsigned integers
|
|
0x23 0x69 0x04 // '#' i 4 number of array elements
|
|
0xCA 0xFE 0xBA 0xBE // content
|
|
```
|
|
|
|
Note that subtype (42) is **not** serialized and that UBJSON has **no binary type**, and deserializing `v` would
|
|
yield the following value:
|
|
|
|
```json
|
|
{
|
|
"binary": [202, 254, 186, 190]
|
|
}
|
|
```
|
|
|
|
### BSON
|
|
|
|
[BSON](binary_formats/bson.md) supports binary values and subtypes. If a subtype is given, it is used and added as
|
|
unsigned 8-bit integer. If no subtype is given, the generic binary subtype 0x00 is used.
|
|
|
|
??? example
|
|
|
|
Code:
|
|
|
|
```cpp
|
|
// create a binary value of subtype 42
|
|
json j;
|
|
j["binary"] = json::binary({0xCA, 0xFE, 0xBA, 0xBE}, 42);
|
|
|
|
// convert to BSON
|
|
auto v = json::to_bson(j);
|
|
```
|
|
|
|
`v` is a `std::vector<std::uint8t>` with the following 22 elements:
|
|
|
|
```c
|
|
0x16 0x00 0x00 0x00 // number of bytes in the document
|
|
0x05 // binary value
|
|
0x62 0x69 0x6E 0x61 0x72 0x79 0x00 // key "binary" + null byte
|
|
0x04 0x00 0x00 0x00 // number of bytes
|
|
0x2a // subtype
|
|
0xCA 0xFE 0xBA 0xBE // content
|
|
0x00 // end of the document
|
|
```
|
|
|
|
Note that the serialization preserves the subtype, and deserializing `v` would yield the following value:
|
|
|
|
```json
|
|
{
|
|
"binary": {
|
|
"bytes": [202, 254, 186, 190],
|
|
"subtype": 42
|
|
}
|
|
}
|
|
```
|
|
|
|
### CBOR
|
|
|
|
[CBOR](binary_formats/cbor.md) supports binary values, but no subtypes. Subtypes will be serialized as tags. Any binary
|
|
value will be serialized as byte strings. The library will choose the smallest representation using the length of the
|
|
byte array.
|
|
|
|
??? example
|
|
|
|
Code:
|
|
|
|
```cpp
|
|
// create a binary value of subtype 42
|
|
json j;
|
|
j["binary"] = json::binary({0xCA, 0xFE, 0xBA, 0xBE}, 42);
|
|
|
|
// convert to CBOR
|
|
auto v = json::to_cbor(j);
|
|
```
|
|
|
|
`v` is a `std::vector<std::uint8t>` with the following 15 elements:
|
|
|
|
```c
|
|
0xA1 // map(1)
|
|
0x66 // text(6)
|
|
0x62 0x69 0x6E 0x61 0x72 0x79 // "binary"
|
|
0xD8 0x2A // tag(42)
|
|
0x44 // bytes(4)
|
|
0xCA 0xFE 0xBA 0xBE // content
|
|
```
|
|
|
|
Note that the subtype is serialized as tag. However, parsing tagged values yield a parse error unless
|
|
`json::cbor_tag_handler_t::ignore` or `json::cbor_tag_handler_t::store` is passed to `json::from_cbor`.
|
|
|
|
```json
|
|
{
|
|
"binary": {
|
|
"bytes": [202, 254, 186, 190],
|
|
"subtype": null
|
|
}
|
|
}
|
|
```
|
|
|
|
### MessagePack
|
|
|
|
[MessagePack](binary_formats/messagepack.md) supports binary values and subtypes. If a subtype is given, the ext family
|
|
is used. The library will choose the smallest representation among fixext1, fixext2, fixext4, fixext8, ext8, ext16, and
|
|
ext32. The subtype is then added as signed 8-bit integer.
|
|
|
|
If no subtype is given, the bin family (bin8, bin16, bin32) is used.
|
|
|
|
??? example
|
|
|
|
Code:
|
|
|
|
```cpp
|
|
// create a binary value of subtype 42
|
|
json j;
|
|
j["binary"] = json::binary({0xCA, 0xFE, 0xBA, 0xBE}, 42);
|
|
|
|
// convert to MessagePack
|
|
auto v = json::to_msgpack(j);
|
|
```
|
|
|
|
`v` is a `std::vector<std::uint8t>` with the following 14 elements:
|
|
|
|
```c
|
|
0x81 // fixmap1
|
|
0xA6 // fixstr6
|
|
0x62 0x69 0x6E 0x61 0x72 0x79 // "binary"
|
|
0xD6 // fixext4
|
|
0x2A // subtype
|
|
0xCA 0xFE 0xBA 0xBE // content
|
|
```
|
|
|
|
Note that the serialization preserves the subtype, and deserializing `v` would yield the following value:
|
|
|
|
```json
|
|
{
|
|
"binary": {
|
|
"bytes": [202, 254, 186, 190],
|
|
"subtype": 42
|
|
}
|
|
}
|
|
```
|
|
|
|
### UBJSON
|
|
|
|
[UBJSON](binary_formats/ubjson.md) neither supports binary values nor subtypes, and proposes to serialize binary values
|
|
as array of uint8 values. This translation is implemented by the library.
|
|
|
|
??? example
|
|
|
|
Code:
|
|
|
|
```cpp
|
|
// create a binary value of subtype 42 (will be ignored in UBJSON)
|
|
json j;
|
|
j["binary"] = json::binary({0xCA, 0xFE, 0xBA, 0xBE}, 42);
|
|
|
|
// convert to UBJSON
|
|
auto v = json::to_ubjson(j);
|
|
```
|
|
|
|
`v` is a `std::vector<std::uint8t>` with the following 20 elements:
|
|
|
|
```c
|
|
0x7B // '{'
|
|
0x69 0x06 // i 6 (length of the key)
|
|
0x62 0x69 0x6E 0x61 0x72 0x79 // "binary"
|
|
0x5B // '['
|
|
0x55 0xCA 0x55 0xFE 0x55 0xBA 0x55 0xBE // content (each byte prefixed with 'U')
|
|
0x5D // ']'
|
|
0x7D // '}'
|
|
```
|
|
|
|
The following code uses the type and size optimization for UBJSON:
|
|
|
|
```cpp
|
|
// convert to UBJSON using the size and type optimization
|
|
auto v = json::to_ubjson(j, true, true);
|
|
```
|
|
|
|
The resulting vector has 23 elements; the optimization is not effective for examples with few values:
|
|
|
|
```c
|
|
0x7B // '{'
|
|
0x24 // '$' type of the object elements
|
|
0x5B // '[' array
|
|
0x23 0x69 0x01 // '#' i 1 number of object elements
|
|
0x69 0x06 // i 6 (length of the key)
|
|
0x62 0x69 0x6E 0x61 0x72 0x79 // "binary"
|
|
0x24 0x55 // '$' 'U' type of the array elements: unsigned integers
|
|
0x23 0x69 0x04 // '#' i 4 number of array elements
|
|
0xCA 0xFE 0xBA 0xBE // content
|
|
```
|
|
|
|
Note that subtype (42) is **not** serialized and that UBJSON has **no binary type**, and deserializing `v` would
|
|
yield the following value:
|
|
|
|
```json
|
|
{
|
|
"binary": [202, 254, 186, 190]
|
|
}
|
|
```
|