docs: Clarify Unicode validation behavior
It has always been the case that pugixml does not perform Unicode validation or name/tag Unicode character class validation, but it wasn't very obvious from documentation. Fixes #162
This commit is contained in:
parent
4f2ad720c8
commit
900a1cc943
@ -811,12 +811,13 @@ There is only one non-conformant behavior when dealing with valid XML documents:
|
|||||||
As for rejecting invalid XML documents, there are a number of incompatibilities with W3C specification, including:
|
As for rejecting invalid XML documents, there are a number of incompatibilities with W3C specification, including:
|
||||||
|
|
||||||
* Multiple attributes of the same node can have equal names.
|
* Multiple attributes of the same node can have equal names.
|
||||||
* All non-ASCII characters are treated in the same way as symbols of English alphabet, so some invalid tag names are not rejected.
|
* Tag and attribute names are not fully validated for consisting of allowed characters, so some invalid tags are not rejected
|
||||||
* Attribute values which contain `<` are not rejected.
|
* Attribute values which contain `<` are not rejected.
|
||||||
* Invalid entity/character references are not rejected and are instead left as is.
|
* Invalid entity/character references are not rejected and are instead left as is.
|
||||||
* Comment values can contain `--`.
|
* Comment values can contain `--`.
|
||||||
* XML data is not required to begin with document declaration; additionally, document declaration can appear after comments and other nodes.
|
* XML data is not required to begin with document declaration; additionally, document declaration can appear after comments and other nodes.
|
||||||
* Invalid document type declarations are silently ignored in some cases.
|
* Invalid document type declarations are silently ignored in some cases.
|
||||||
|
* Unicode validation is not performed so invalid UTF sequences are not rejected.
|
||||||
|
|
||||||
[[access]]
|
[[access]]
|
||||||
== Accessing document data
|
== Accessing document data
|
||||||
|
@ -1941,7 +1941,7 @@ The current behavior for Unicode conversion is to skip all invalid UTF sequences
|
|||||||
<p>Multiple attributes of the same node can have equal names.</p>
|
<p>Multiple attributes of the same node can have equal names.</p>
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
<p>All non-ASCII characters are treated in the same way as symbols of English alphabet, so some invalid tag names are not rejected.</p>
|
<p>Tag and attribute names are not fully validated for consisting of allowed characters, so some invalid tags are not rejected</p>
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
<p>Attribute values which contain <code><</code> are not rejected.</p>
|
<p>Attribute values which contain <code><</code> are not rejected.</p>
|
||||||
@ -1958,6 +1958,9 @@ The current behavior for Unicode conversion is to skip all invalid UTF sequences
|
|||||||
<li>
|
<li>
|
||||||
<p>Invalid document type declarations are silently ignored in some cases.</p>
|
<p>Invalid document type declarations are silently ignored in some cases.</p>
|
||||||
</li>
|
</li>
|
||||||
|
<li>
|
||||||
|
<p>Unicode validation is not performed so invalid UTF sequences are not rejected.</p>
|
||||||
|
</li>
|
||||||
</ul>
|
</ul>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@ -5672,7 +5675,7 @@ If exceptions are disabled, then in the event of parsing failure the query is in
|
|||||||
</div>
|
</div>
|
||||||
<div id="footer">
|
<div id="footer">
|
||||||
<div id="footer-text">
|
<div id="footer-text">
|
||||||
Last updated 2017-08-21 08:46:53 DST
|
Last updated 2017-08-29 20:45:58 DST
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</body>
|
</body>
|
||||||
|
Loading…
Reference in New Issue
Block a user