diff --git a/doc/Doxyfile b/doc/Doxyfile index 4c64dd6f9..46d3685c7 100644 --- a/doc/Doxyfile +++ b/doc/Doxyfile @@ -109,7 +109,8 @@ WARN_LOGFILE = #--------------------------------------------------------------------------- INPUT = ../src/json.hpp \ index.md \ - faq.md + faq.md \ + binary_formats.md INPUT_ENCODING = UTF-8 FILE_PATTERNS = RECURSIVE = NO diff --git a/doc/binary_formats.md b/doc/binary_formats.md new file mode 100644 index 000000000..8b1878601 --- /dev/null +++ b/doc/binary_formats.md @@ -0,0 +1,172 @@ +# Binary formats + +![conversion between JSON and binary formats](images/binary.png) + +Several formats exist that encode JSON values in a binary format to reduce the size of the encoded value as well as the required effort to parse encoded value. The library implements three formats, namely + +- [CBOR](https://tools.ietf.org/html/rfc7049) (Concise Binary Object Representation) +- [MessagePack](https://msgpack.org) +- [UBJSON](http://ubjson.org) (Universal Binary JSON) + +## Interface + +### JSON to binary format + +For each format, the `to_*` functions (i.e., `to_cbor`, `to_msgpack`, and `to_ubjson`) convert a JSON value into the respective binary format. Taking CBOR as example, the concrete prototypes are: + +```cpp +static std::vector to_cbor(const basic_json& j); // 1 +static void to_cbor(const basic_json& j, detail::output_adapter o); // 2 +static void to_cbor(const basic_json& j, detail::output_adapter o); // 3 +``` + +The first function creates a byte vector from the given JSON value. The second and third function writes to an output adapter of `uint8_t` and `char`, respectively. Output adapters are implemented for strings, output streams, and vectors. + +Given a JSON value `j`, the following calls are possible: + +```cpp +std::vector v; +v = json::to_cbor(j); // 1 + +json::to_cbor(j, v); // 2 + +std::string s; +json::to_cbor(j, s); // 3 + +std::ostringstream oss; +json::to_cbor(j, oss); // 3 +``` + +### Binary format to JSON + +Likewise, the `from_*` functions (i.e, `from_cbor`, `from_msgpack`, and `from_ubjson`) convert a binary encoded value into a JSON value. Taking CBOR as example, the concrete prototypes are: + +```cpp +static basic_json from_cbor(detail::input_adapter i, const bool strict = true); // 1 +static basic_json from_cbor(A1 && a1, A2 && a2, const bool strict = true); // 2 +``` + +Both functions read from an input adapter: the first function takes it directly form argument `i`, whereas the second function creates it from the provided arguments `a1` and `a2`. If the optional parameter `strict` is true, the input must be read completely (or a parse error exception is thrown). If it is false, parsing succeeds even if the input is not completely read. + +Input adapters are implemented for input streams, character buffers, string literals, and iterator ranges. + +Given several inputs (which we assume to be filled with a CBOR value), the following calls are possible: + +```cpp +std::string s; +json j1 = json::from_cbor(s); // 1 + +std::ifstream is("somefile.cbor", std::ios::binary); +json j2 = json::from_cbor(is); // 1 + +std::vector v; +json j3 = json::from_cbor(v); // 1 + +const char* buff; +ize_t buff_size; +json j4 = json::from_cbor(buff, buff_size); // 2 +``` + +## Details + +### CBOR + +The mapping from CBOR to JSON is **incomplete** in the sense that not all CBOR types can be converted to a JSON value. The following CBOR types are not supported and will yield parse errors (parse_error.112): + +- byte strings (0x40..0x5F) +- date/time (0xC0..0xC1) +- bignum (0xC2..0xC3) +- decimal fraction (0xC4) +- bigfloat (0xC5) +- tagged items (0xC6..0xD4, 0xD8..0xDB) +- expected conversions (0xD5..0xD7) +- simple values (0xE0..0xF3, 0xF8) +- undefined (0xF7) + +CBOR further allows map keys of any type, whereas JSON only allows strings as keys in object values. Therefore, CBOR maps with keys other than UTF-8 strings are rejected (parse_error.113). + +The mapping from JSON to CBOR is **complete** in the sense that any JSON value type can be converted to a CBOR value. + +If NaN or Infinity are stored inside a JSON number, they are serialized properly. This behavior differs from the dump() function which serializes NaN or Infinity to null. + +The following CBOR types are not used in the conversion: + +- byte strings (0x40..0x5F) +- UTF-8 strings terminated by "break" (0x7F) +- arrays terminated by "break" (0x9F) +- maps terminated by "break" (0xBF) +- date/time (0xC0..0xC1) +- bignum (0xC2..0xC3) +- decimal fraction (0xC4) +- bigfloat (0xC5) +- tagged items (0xC6..0xD4, 0xD8..0xDB) +- expected conversions (0xD5..0xD7) +- simple values (0xE0..0xF3, 0xF8) +- undefined (0xF7) +- half and single-precision floats (0xF9-0xFA) +- break (0xFF) + +### MessagePack + +The mapping from MessagePack to JSON is **incomplete** in the sense that not all MessagePack types can be converted to a JSON value. The following MessagePack types are not supported and will yield parse errors: + +- bin 8 - bin 32 (0xC4..0xC6) +- ext 8 - ext 32 (0xC7..0xC9) +- fixext 1 - fixext 16 (0xD4..0xD8) + +The mapping from JSON to MessagePack is **complete** in the sense that any JSON value type can be converted to a MessagePack value. + +The following values can not be converted to a MessagePack value: + +- strings with more than 4294967295 bytes +- arrays with more than 4294967295 elements +- objects with more than 4294967295 elements + +The following MessagePack types are not used in the conversion: + +- bin 8 - bin 32 (0xC4..0xC6) +- ext 8 - ext 32 (0xC7..0xC9) +- float 32 (0xCA) +- fixext 1 - fixext 16 (0xD4..0xD8) + +Any MessagePack output created `to_msgpack` can be successfully parsed by `from_msgpack`. + +If NaN or Infinity are stored inside a JSON number, they are serialized properly. This behavior differs from the `dump()` function which serializes NaN or Infinity to `null`. + +### UBJSON + +The mapping from UBJSON to JSON is **complete** in the sense that any UBJSON value can be converted to a JSON value. + +The mapping from JSON to UBJSON is **complete** in the sense that any JSON value type can be converted to a UBJSON value. + +The following values can not be converted to a UBJSON value: + +- strings with more than 9223372036854775807 bytes (theoretical) +- unsigned integer numbers above 9223372036854775807 + +The following markers are not used in the conversion: + +- `Z`: no-op values are not created. +- `C`: single-byte strings are serialized with S markers. + +Any UBJSON output created to_ubjson can be successfully parsed by from_ubjson. + +If NaN or Infinity are stored inside a JSON number, they are serialized properly. This behavior differs from the `dump()` function which serializes NaN or Infinity to null. + +The optimized formats for containers are supported: Parameter `use_size` adds size information to the beginning of a container and removes the closing marker. Parameter `use_type` further checks whether all elements of a container have the same type and adds the type marker to the beginning of the container. The `use_type` parameter must only be used together with `use_size = true`. Note that `use_size = true` alone may result in larger representations - the benefit of this parameter is that the receiving side is immediately informed on the number of elements of the container. + +## Size comparison examples + +The following table shows the size (in bytes) of different files in the `test/data` directory for the different formats. + +| format | sample.json | floats.json | all_unicode.json | +| ----------------------- | -----------:| -----------:| ----------------:| +| JSON | 687491 | 22670390 | 13279259 | +| CBOR | **147095** | 9000005 | **5494662** | +| MsgPack | 148395 | 9000005 | **5494662** | +| UBJSON unoptimized | 148695 | 9000002 | 7718787 | +| UBJSON size-optimized | 150569 | 9000007 | 7718792 | +| UBJSON format-optimized | 150883 | **8000009** | 7718792 | + +The results show that there does not exist a "best" encoding. Furthermore, it is not always worthwhile to use UBJSON's optimizations. + diff --git a/doc/images/binary.png b/doc/images/binary.png new file mode 100644 index 000000000..2579fd8f4 Binary files /dev/null and b/doc/images/binary.png differ diff --git a/doc/index.md b/doc/index.md index 6fbe2a4ad..66ad588cd 100644 --- a/doc/index.md +++ b/doc/index.md @@ -42,7 +42,7 @@ These pages contain the API documentation of JSON for Modern C++, a C++11 header - @link nlohmann::basic_json::parse parse @endlink parse from string - @link nlohmann::basic_json::operator>>(std::istream&, basic_json&) operator>> @endlink parse from stream - @link nlohmann::basic_json::accept accept @endlink check for syntax errors without parsing - - binary formats: + - [binary formats](binary_formats.md): - CBOR: @link nlohmann::basic_json::from_cbor from_cbor @endlink / @link nlohmann::basic_json::to_cbor to_cbor @endlink - MessagePack: @link nlohmann::basic_json::from_msgpack from_msgpack @endlink / @link nlohmann::basic_json::to_msgpack to_msgpack @endlink - UBJSON: @link nlohmann::basic_json::from_ubjson from_ubjson @endlink / @link nlohmann::basic_json::to_ubjson to_ubjson @endlink