docs: Update from master

2017-12-22 10:43:12 -08:00 · 2017-12-22 10:43:12 -08:00 · 04d0ccdbc3
commit 04d0ccdbc3
parent 986e36158e
4 changed files with 231 additions and 245 deletions
--- a/docs/manual.adoc
+++ b/docs/manual.adoc
@ -46,7 +46,7 @@ Thanks to *Vyacheslav Egorov* for documentation proofreading and fuzz testing.
 The pugixml library is distributed under the MIT license:

 ....
-Copyright (c) 2006-2016 Arseny Kapoulkine
+Copyright (c) 2006-2017 Arseny Kapoulkine

 Permission is hereby granted, free of charge, to any person
 obtaining a copy of this software and associated documentation
@ -74,7 +74,7 @@ This means that you can freely use pugixml in your applications, both open-sourc

 ....
 This software is based on pugixml library (http://pugixml.org).
-pugixml is Copyright (C) 2006-2016 Arseny Kapoulkine.
+pugixml is Copyright (C) 2006-2017 Arseny Kapoulkine.
 ....

 [[install]]
@ -556,7 +556,7 @@ On 32-bit architectures document structure in compact mode is typically reduced

 pugixml provides several functions for loading XML data from various places - files, C{plus}{plus} iostreams, memory buffers. All functions use an extremely fast non-validating parser. This parser is not fully W3C conformant - it can load any valid XML document, but does not perform some well-formedness checks. While considerable effort is made to reject invalid XML documents, some validation is not performed for performance reasons. Also some XML transformations (i.e. EOL handling or attribute value normalization) can impact parsing speed and thus can be disabled. However for vast majority of XML documents there is no performance difference between different parsing options. Parsing options also control whether certain XML nodes are parsed; see <<loading.options>> for more information.

-XML data is always converted to internal character format (see <<dom.unicode>>) before parsing. pugixml supports all popular Unicode encodings (UTF-8, UTF-16 (big and little endian), UTF-32 (big and little endian); UCS-2 is naturally supported since it's a strict subset of UTF-16) and handles all encoding conversions automatically. Unless explicit encoding is specified, loading functions perform automatic encoding detection based on first few characters of XML data, so in almost all cases you do not have to specify document encoding. Encoding conversion is described in more detail in <<loading.encoding>>.
+XML data is always converted to internal character format (see <<dom.unicode>>) before parsing. pugixml supports all popular Unicode encodings (UTF-8, UTF-16 (big and little endian), UTF-32 (big and little endian); UCS-2 is naturally supported since it's a strict subset of UTF-16) as well as some non-Unicode encodings (Latin-1) and handles all encoding conversions automatically. Unless explicit encoding is specified, loading functions perform automatic encoding detection based on source XML data, so in most cases you do not have to specify document encoding. Encoding conversion is described in more detail in <<loading.encoding>>.

 [[loading.file]]
 === Loading document from file
@ -784,17 +784,9 @@ include::samples/load_options.cpp[tags=code]
 === Encodings

 [[xml_encoding]]
-pugixml supports all popular Unicode encodings (UTF-8, UTF-16 (big and little endian), UTF-32 (big and little endian); UCS-2 is naturally supported since it's a strict subset of UTF-16) and handles all encoding conversions. Most loading functions accept the optional parameter `encoding`. This is a value of enumeration type `xml_encoding`, that can have the following values:
-
-* [[encoding_auto]]`encoding_auto` means that pugixml will try to guess the encoding based on source XML data. The algorithm is a modified version of the one presented in http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info[Appendix F.1 of XML recommendation]; it tries to match the first few bytes of input data with the following patterns in strict order:
-** If first four bytes match UTF-32 BOM (Byte Order Mark), encoding is assumed to be UTF-32 with the endianness equal to that of BOM;
-** If first two bytes match UTF-16 BOM, encoding is assumed to be UTF-16 with the endianness equal to that of BOM;
-** If first three bytes match UTF-8 BOM, encoding is assumed to be UTF-8;
-** If first four bytes match UTF-32 representation of `<`, encoding is assumed to be UTF-32 with the corresponding endianness;
-** If first four bytes match UTF-16 representation of `<?`, encoding is assumed to be UTF-16 with the corresponding endianness;
-** If first two bytes match UTF-16 representation of `<`, encoding is assumed to be UTF-16 with the corresponding endianness (this guess may yield incorrect result, but it's better than UTF-8);
-** Otherwise encoding is assumed to be UTF-8.
+pugixml supports all popular Unicode encodings (UTF-8, UTF-16 (big and little endian), UTF-32 (big and little endian); UCS-2 is naturally supported since it's a strict subset of UTF-16) as well as some non-Unicode encodings (Latin-1) and handles all encoding conversions. Most loading functions accept the optional parameter `encoding`. This is a value of enumeration type `xml_encoding`, that can have the following values:

+* [[encoding_auto]]`encoding_auto` means that pugixml will try to guess the encoding based on source XML data. The algorithm is a modified version of the one presented in http://www.w3.org/TR/REC-xml/#sec-guessing[Appendix F of XML recommendation]. It tries to find a Byte Order Mark of one of the supported encodings first; if that fails, it checks if the first few bytes of the input data look like a representation of `<` or `<?` in one of UTF-16 or UTF-32 variants; if that fails as well, encoding is assumed to be either UTF-8 or one of the non-Unicode encodings - to make the final decision the algorithm tries to parse the `encoding` attribute of the XML document declaration, ultimately falling back to UTF-8 if document declaration is not present or does not specify a supported encoding.
 * [[encoding_utf8]]`encoding_utf8` corresponds to UTF-8 encoding as defined in the Unicode standard; UTF-8 sequences with length equal to 5 or 6 are not standard and are rejected.
 * [[encoding_utf16_le]]`encoding_utf16_le` corresponds to little-endian UTF-16 encoding as defined in the Unicode standard; surrogate pairs are supported.
 * [[encoding_utf16_be]]`encoding_utf16_be` corresponds to big-endian UTF-16 encoding as defined in the Unicode standard; surrogate pairs are supported.
@ -819,12 +811,13 @@ There is only one non-conformant behavior when dealing with valid XML documents:
 As for rejecting invalid XML documents, there are a number of incompatibilities with W3C specification, including:

 * Multiple attributes of the same node can have equal names.
-* All non-ASCII characters are treated in the same way as symbols of English alphabet, so some invalid tag names are not rejected.
+* Tag and attribute names are not fully validated for consisting of allowed characters, so some invalid tags are not rejected
 * Attribute values which contain `<` are not rejected.
 * Invalid entity/character references are not rejected and are instead left as is.
 * Comment values can contain `--`.
 * XML data is not required to begin with document declaration; additionally, document declaration can appear after comments and other nodes.
 * Invalid document type declarations are silently ignored in some cases.
+* Unicode validation is not performed so invalid UTF sequences are not rejected.

 [[access]]
 == Accessing document data
--- a/docs/manual.html
+++ b/docs/manual.html
--- a/docs/quickstart.adoc
+++ b/docs/quickstart.adoc
@ -54,7 +54,7 @@ There is a special value of `xml_node` type, known as null node or empty node. I

 `xml_attribute` is the handle to an XML attribute; it has the same semantics as `xml_node`, i.e. there can be several `xml_attribute` handles pointing to the same underlying object and there is a special null attribute value, which propagates to function results.

-There are two choices of interface and internal representation when configuring pugixml: you can either choose the UTF-8 (also called char) interface or UTF-16/32 (also called wchar_t) one. The choice is controlled via `PUGIXML_WCHAR_MODE` define; you can set it via `pugiconfig.hpp` or via preprocessor options. All tree functions that work with strings work with either C-style null terminated strings or STL strings of the selected character type. link:manual/dom.html#dom.unicode[Read the manual] for additional information on Unicode interface.
+There are two choices of interface and internal representation when configuring pugixml: you can either choose the UTF-8 (also called char) interface or UTF-16/32 (also called wchar_t) one. The choice is controlled via `PUGIXML_WCHAR_MODE` define; you can set it via `pugiconfig.hpp` or via preprocessor options. All tree functions that work with strings work with either C-style null terminated strings or STL strings of the selected character type. link:manual.html#dom.unicode[Read the manual] for additional information on Unicode interface.

 [[loading]]
 == Loading document
@ -240,7 +240,7 @@ This is a simple example of custom writer for saving document data to STL string
 include::samples/save_custom_writer.cpp[tags=code]
 ----

-While the previously described functions save the whole document to the destination, it is easy to save a single subtree. Instead of calling `xml_document::save`, just call `xml_node::print` function on the target node. You can save node contents to C{plus}{plus} IOstream object or custom writer in this way. Saving a subtree slightly differs from saving the whole document; link:manual/saving.html#saving.subtree[read the manual] for more information.
+While the previously described functions save the whole document to the destination, it is easy to save a single subtree. Instead of calling `xml_document::save`, just call `xml_node::print` function on the target node. You can save node contents to C{plus}{plus} IOstream object or custom writer in this way. Saving a subtree slightly differs from saving the whole document; link:manual.html#saving.subtree[read the manual] for more information.

 [[feedback]]
 == Feedback
@ -255,7 +255,7 @@ If filing an issue is not possible due to privacy or other concerns, you can con
 The pugixml library is distributed under the MIT license:

 ....
-Copyright (c) 2006-2016 Arseny Kapoulkine
+Copyright (c) 2006-2017 Arseny Kapoulkine

 Permission is hereby granted, free of charge, to any person
 obtaining a copy of this software and associated documentation
@ -283,5 +283,5 @@ This means that you can freely use pugixml in your applications, both open-sourc

 ....
 This software is based on pugixml library (http://pugixml.org).
-pugixml is Copyright (C) 2006-2016 Arseny Kapoulkine.
-....
+pugixml is Copyright (C) 2006-2017 Arseny Kapoulkine.
+....
--- a/docs/quickstart.html
+++ b/docs/quickstart.html
@ -425,8 +425,10 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 .listingblock .pygments .tok-err { border: 1px solid #FF0000 } /* Error */
 .listingblock .pygments .tok-k { color: #008000; font-weight: bold } /* Keyword */
 .listingblock .pygments .tok-o { color: #666666 } /* Operator */
+.listingblock .pygments .tok-ch { color: #408080; font-style: italic } /* Comment.Hashbang */
 .listingblock .pygments .tok-cm { color: #408080; font-style: italic } /* Comment.Multiline */
 .listingblock .pygments .tok-cp { color: #BC7A00 } /* Comment.Preproc */
+.listingblock .pygments .tok-cpf { color: #408080; font-style: italic } /* Comment.PreprocFile */
 .listingblock .pygments .tok-c1 { color: #408080; font-style: italic } /* Comment.Single */
 .listingblock .pygments .tok-cs { color: #408080; font-style: italic } /* Comment.Special */
 .listingblock .pygments .tok-gd { color: #A00000 } /* Generic.Deleted */
@ -466,8 +468,10 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 .listingblock .pygments .tok-mh { color: #666666 } /* Literal.Number.Hex */
 .listingblock .pygments .tok-mi { color: #666666 } /* Literal.Number.Integer */
 .listingblock .pygments .tok-mo { color: #666666 } /* Literal.Number.Oct */
+.listingblock .pygments .tok-sa { color: #BA2121 } /* Literal.String.Affix */
 .listingblock .pygments .tok-sb { color: #BA2121 } /* Literal.String.Backtick */
 .listingblock .pygments .tok-sc { color: #BA2121 } /* Literal.String.Char */
+.listingblock .pygments .tok-dl { color: #BA2121 } /* Literal.String.Delimiter */
 .listingblock .pygments .tok-sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */
 .listingblock .pygments .tok-s2 { color: #BA2121 } /* Literal.String.Double */
 .listingblock .pygments .tok-se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */
@ -478,9 +482,11 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 .listingblock .pygments .tok-s1 { color: #BA2121 } /* Literal.String.Single */
 .listingblock .pygments .tok-ss { color: #19177C } /* Literal.String.Symbol */
 .listingblock .pygments .tok-bp { color: #008000 } /* Name.Builtin.Pseudo */
+.listingblock .pygments .tok-fm { color: #0000FF } /* Name.Function.Magic */
 .listingblock .pygments .tok-vc { color: #19177C } /* Name.Variable.Class */
 .listingblock .pygments .tok-vg { color: #19177C } /* Name.Variable.Global */
 .listingblock .pygments .tok-vi { color: #19177C } /* Name.Variable.Instance */
+.listingblock .pygments .tok-vm { color: #19177C } /* Name.Variable.Magic */
 .listingblock .pygments .tok-il { color: #666666 } /* Literal.Number.Integer.Long */
 </style>
 </head>
@ -610,7 +616,7 @@ All pugixml classes and functions are located in <code>pugi</code> namespace; yo
 <p><code>xml_attribute</code> is the handle to an XML attribute; it has the same semantics as <code>xml_node</code>, i.e. there can be several <code>xml_attribute</code> handles pointing to the same underlying object and there is a special null attribute value, which propagates to function results.</p>
 </div>
 <div class="paragraph">
-<p>There are two choices of interface and internal representation when configuring pugixml: you can either choose the UTF-8 (also called char) interface or UTF-16/32 (also called wchar_t) one. The choice is controlled via <code>PUGIXML_WCHAR_MODE</code> define; you can set it via <code>pugiconfig.hpp</code> or via preprocessor options. All tree functions that work with strings work with either C-style null terminated strings or STL strings of the selected character type. <a href="manual/dom.html#dom.unicode">Read the manual</a> for additional information on Unicode interface.</p>
+<p>There are two choices of interface and internal representation when configuring pugixml: you can either choose the UTF-8 (also called char) interface or UTF-16/32 (also called wchar_t) one. The choice is controlled via <code>PUGIXML_WCHAR_MODE</code> define; you can set it via <code>pugiconfig.hpp</code> or via preprocessor options. All tree functions that work with strings work with either C-style null terminated strings or STL strings of the selected character type. <a href="manual.html#dom.unicode">Read the manual</a> for additional information on Unicode interface.</p>
 </div>
 </div>
 </div>
@ -1010,7 +1016,7 @@ XPath functions throw <code>xpath_exception</code> objects on error; the sample
 </div>
 </div>
 <div class="paragraph">
-<p>While the previously described functions save the whole document to the destination, it is easy to save a single subtree. Instead of calling <code>xml_document::save</code>, just call <code>xml_node::print</code> function on the target node. You can save node contents to C&#43;&#43; IOstream object or custom writer in this way. Saving a subtree slightly differs from saving the whole document; <a href="manual/saving.html#saving.subtree">read the manual</a> for more information.</p>
+<p>While the previously described functions save the whole document to the destination, it is easy to save a single subtree. Instead of calling <code>xml_document::save</code>, just call <code>xml_node::print</code> function on the target node. You can save node contents to C&#43;&#43; IOstream object or custom writer in this way. Saving a subtree slightly differs from saving the whole document; <a href="manual.html#saving.subtree">read the manual</a> for more information.</p>
 </div>
 </div>
 </div>
@ -1033,7 +1039,7 @@ XPath functions throw <code>xpath_exception</code> objects on error; the sample
 </div>
 <div class="literalblock">
 <div class="content">
-<pre>Copyright (c) 2006-2016 Arseny Kapoulkine
+<pre>Copyright (c) 2006-2017 Arseny Kapoulkine

 Permission is hereby granted, free of charge, to any person
 obtaining a copy of this software and associated documentation
@ -1063,7 +1069,7 @@ OTHER DEALINGS IN THE SOFTWARE.</pre>
 <div class="literalblock">
 <div class="content">
 <pre>This software is based on pugixml library (http://pugixml.org).
-pugixml is Copyright (C) 2006-2016 Arseny Kapoulkine.</pre>
+pugixml is Copyright (C) 2006-2017 Arseny Kapoulkine.</pre>
 </div>
 </div>
 </div>
@ -1077,7 +1083,7 @@ pugixml is Copyright (C) 2006-2016 Arseny Kapoulkine.</pre>
 </div>
 <div id="footer">
 <div id="footer-text">
-Last updated 2016-11-24 00:20:49 STD
+Last updated 2017-12-22 10:40:20 STD
 </div>
 </div>
 </body>