char8 mode, enabled using the PUGIXML_CHAR8_MODE macro, uses C++20 char8_t
instead of char for the UTF-8 interface. This makes use of pugixml safer
when char is otherwise used for the system codepage.
Stream-based methods received an additional overload, since the char
overload may be used to represent arbitrary bytes, and the char8_t overload
may be used by string streams.
An additional typedef u8char_t, which represents the type pugixml uses for
a UTF-8 code unit, was added for the conversion functions.
Most changes had to be done in the test code. Representing raw bytes as
string literals does not work for UTF-8 literals, since hex escape codes
are interpreted as a Unicode character. Affected places either received a
branch with a u8 literal or use a new RAW() macro which smuggles in UTF-8
code points using chars.
Since foo//bar//baz adds two nodes for each //, we need to increment the
depth by 2 on each iteration to limit the AST correctly.
Fixes the stack overflow found by cluster-fuzz (I suspect the issue
there is a bit deeper, but this part is definitely a bug and as such I'd
rather wait for the next test case for now).
Function call arguments are stored in a list which is processed
recursively during optimize(). We now limit the depth of this construct
as well to make sure optimize() doesn't run out of stack space.
The default stack on MSVC/x64/debug is sufficient for 1692 nested
invocations only, whereas on clang/linux it's ~8K...
For now set the limit to be conservative.
XPath parser and execution engine isn't stackless; the depth of the
query controls the amount of C stack space required.
This change instruments places in the parser where the control flow can
recurse, requiring too much C stack space to produce an AST, or where a
stackless parse is used to produce arbitrarily deep AST which will
create issues for downstream processing.
As a result XPath parser should now be fuzz safe for malicious inputs.
gcov -b surfaced many lines with partial coverage, where branch is only
ever taken or not taken, or one of the expressions in a complex
conditional is always either true or false. This change adds a series of
tests (mostly focusing on XPath) to reduce the number of partially
covered lines.
This test is supposed to test error coverage in different expressions
that are nested in other expressions to reduce the number of never-taken
branches in tests (and make sure we aren't missing any).
Previously there was no guarantee that the tests that check for out of memory
handling behavior are actually correct - e.g. that they correctly simulate out
of memory conditions.
Now every simulated out of memory condition has to be "guarded" using
CHECK_ALLOC_FAIL. It makes sure that every piece of code that is supposed to
cause out-of-memory does so, and that no other code runs out of memory
unnoticed.