No documentation is perfect, neither is this one. If you encounter a description
that is unclear, please file an issue as described in <aclass="xref"href="quickstart.html#quickstart.main.feedback"title="Feedback"> Feedback</a>. Also if
you can spare the time for a full proof-reading, including spelling and
grammar, that would be great! Please <aclass="link"href="quickstart.html#email">send me an e-mail</a>;
as a token of appreciation, your name will be included into the corresponding
The distribution contains library source, documentation (the guide you're
reading now and the manual) and some code examples. After downloading the
distribution, install pugixml by extracting all files from the compressed
archive.
</p>
<p>
The complete pugixml source consists of four files - two source files, <codeclass="filename">pugixml.cpp</code> and
<codeclass="filename">pugixpath.cpp</code>, and two header files, <codeclass="filename">pugixml.hpp</code> and <codeclass="filename">pugiconfig.hpp</code>. <codeclass="filename">pugixml.hpp</code> is
the primary header which you need to include in order to use pugixml classes/functions.
The rest of this guide assumes that <codeclass="filename">pugixml.hpp</code> is either in the current directory
or in one of include directories of your projects, so that <codeclass="computeroutput"><spanclass="preprocessor">#include</span><spanclass="string">"pugixml.hpp"</span></code>
can find the header; however you can also use relative path (i.e. <codeclass="computeroutput"><spanclass="preprocessor">#include</span><spanclass="string">"../libs/pugixml/src/pugixml.hpp"</span></code>)
or include directory-relative path (i.e. <codeclass="computeroutput"><spanclass="preprocessor">#include</span>
Microsoft Visual Studio<sup>[<aname="trademarks"href="#ftn.trademarks"class="footnote">1</a>]</sup>, Apple Xcode, Code::Blocks or any other IDE, just add
<codeclass="filename">pugixml.cpp</code> and <codeclass="filename">pugixpath.cpp</code> to one of your projects. There are other building
methods available, including building pugixml as a standalone static/shared
All pugixml classes and functions are located in <codeclass="computeroutput"><spanclass="identifier">pugi</span></code>
namespace; you have to either use explicit name qualification (i.e. <codeclass="computeroutput"><spanclass="identifier">pugi</span><spanclass="special">::</span><spanclass="identifier">xml_node</span></code>), or to gain access to relevant
symbols via <codeclass="computeroutput"><spanclass="keyword">using</span></code> directive
(i.e. <codeclass="computeroutput"><spanclass="keyword">using</span><spanclass="identifier">pugi</span><spanclass="special">::</span><spanclass="identifier">xml_node</span><spanclass="special">;</span></code> or <codeclass="computeroutput"><spanclass="keyword">using</span>
<codeclass="computeroutput"><spanclass="identifier">xml_document</span></code> is the owner
of the entire document structure; destroying the document destroys the whole
tree. The interface of <codeclass="computeroutput"><spanclass="identifier">xml_document</span></code>
consists of loading functions, saving functions and the interface of <codeclass="computeroutput"><spanclass="identifier">xml_node</span></code>, which allows for document inspection
and/or modification. Note that while <codeclass="computeroutput"><spanclass="identifier">xml_document</span></code>
is a sub-class of <codeclass="computeroutput"><spanclass="identifier">xml_node</span></code>,
<codeclass="computeroutput"><spanclass="identifier">xml_node</span></code> is not a polymorphic
type; the inheritance is only used to simplify usage.
</p>
<p>
<codeclass="computeroutput"><spanclass="identifier">xml_node</span></code> is the handle to
document node; it can point to any node in the document, including document
itself. There is a common interface for nodes of all types. Note that <codeclass="computeroutput"><spanclass="identifier">xml_node</span></code> is only a handle to the actual
node, not the node itself - you can have several <codeclass="computeroutput"><spanclass="identifier">xml_node</span></code>
handles pointing to the same underlying object. Destroying <codeclass="computeroutput"><spanclass="identifier">xml_node</span></code> handle does not destroy the node
and does not remove it from the tree.
</p>
<p>
There is a special value of <codeclass="computeroutput"><spanclass="identifier">xml_node</span></code>
type, known as null node or empty node. It does not correspond to any node
in any document, and thus resembles null pointer. However, all operations
are defined on empty nodes; generally the operations don't do anything and
return empty nodes/attributes or empty strings as their result. This is useful
for chaining calls; i.e. you can get the grandparent of a node like so:
if a node is a null node or it does not have a parent, the first <codeclass="computeroutput"><spanclass="identifier">parent</span><spanclass="special">()</span></code>
call returns null node; the second <codeclass="computeroutput"><spanclass="identifier">parent</span><spanclass="special">()</span></code> call then also returns null node, so you
don't have to check for errors twice. You can test if a handle is null via
or <codeclass="computeroutput"><spanclass="keyword">if</span><spanclass="special">(!</span><spanclass="identifier">node</span><spanclass="special">)</span><spanclass="special">{</span><spanclass="special">...</span><spanclass="special">}</span></code>.
</p>
<p>
<codeclass="computeroutput"><spanclass="identifier">xml_attribute</span></code> is the handle
to an XML attribute; it has the same semantics as <codeclass="computeroutput"><spanclass="identifier">xml_node</span></code>,
i.e. there can be several <codeclass="computeroutput"><spanclass="identifier">xml_attribute</span></code>
handles pointing to the same underlying object, there is a special null attribute
value, which propagates to function results.
</p>
<p>
There are two choices of interface and internal representation when configuring
pugixml: you can either choose the UTF-8 (also called char) interface or
UTF-16/32 (also called wchar_t) one. The choice is controlled via <codeclass="computeroutput"><spanclass="identifier">PUGIXML_WCHAR_MODE</span></code> define; you can set
it via <codeclass="filename">pugiconfig.hpp</code> or via preprocessor options. All tree functions that
work with strings work with either C-style null terminated strings or STL
strings of the selected character type. Read the manual for additional information
Otherwise you can use the <codeclass="computeroutput"><spanclass="identifier">status</span></code>
member to get parsing status, or the <codeclass="computeroutput"><spanclass="identifier">description</span><spanclass="special">()</span></code> member function to get the status in a
string form.
</p>
<p>
This is an example of handling loading errors (<ahref="samples/load_error_handling.cpp"target="_top">samples/load_error_handling.cpp</a>):
<spanclass="identifier">std</span><spanclass="special">::</span><spanclass="identifier">cout</span><spanclass="special"><<</span><spanclass="string">"Error offset: "</span><spanclass="special"><<</span><spanclass="identifier">result</span><spanclass="special">.</span><spanclass="identifier">offset</span><spanclass="special"><<</span><spanclass="string">" (error at [..."</span><spanclass="special"><<</span><spanclass="special">(</span><spanclass="identifier">source</span><spanclass="special">+</span><spanclass="identifier">result</span><spanclass="special">.</span><spanclass="identifier">offset</span><spanclass="special">)</span><spanclass="special"><<</span><spanclass="string">"]\n\n"</span><spanclass="special">;</span>
<spanclass="special">}</span>
</pre>
<p>
</p>
<p>
Sometimes XML data should be loaded from some other source than file, i.e.
HTTP URL; also you may want to load XML data from file using non-standard
functions, i.e. to use your virtual file system facilities or to load XML
from gzip-compressed files. These scenarios either require loading document
from memory, in which case you should prepare a contiguous memory block with
all XML data and to pass it to one of buffer loading functions, or loading
document from C++ IOstream, in which case you should provide an object which
implements <codeclass="computeroutput"><spanclass="identifier">std</span><spanclass="special">::</span><spanclass="identifier">istream</span></code> or <codeclass="computeroutput"><spanclass="identifier">std</span><spanclass="special">::</span><spanclass="identifier">wistream</span></code>
interface.
</p>
<p>
There are different functions for loading document from memory; they treat
the passed buffer as either an immutable one (<codeclass="computeroutput"><spanclass="identifier">load_buffer</span></code>),
a mutable buffer which is owned by the caller (<codeclass="computeroutput"><spanclass="identifier">load_buffer_inplace</span></code>),
or a mutable buffer which ownership belongs to pugixml (<codeclass="computeroutput"><spanclass="identifier">load_buffer_inplace_own</span></code>).
There is also a simple helper function, <codeclass="computeroutput"><spanclass="identifier">xml_document</span><spanclass="special">::</span><spanclass="identifier">load</span></code>,
for cases when you want to load the XML document from null-terminated character
string.
</p>
<p>
This is an example of loading XML document from memory using one of these
<preclass="programlisting"><spanclass="comment">// You can use load_buffer_inplace to load document from mutable memory block; the block's lifetime must exceed that of document
pugixml features an extensive interface for getting various types of data
from the document and for traversing the document. You can use various accessors
to get node/attribute data, you can traverse the child node/attribute lists
via accessors or iterators, you can do depth-first traversals with <codeclass="computeroutput"><spanclass="identifier">xml_tree_walker</span></code> objects, and you can use
XPath for complex data-driven queries.
</p>
<p>
You can get node or attribute name via <codeclass="computeroutput"><spanclass="identifier">name</span><spanclass="special">()</span></code> accessor, and value via <codeclass="computeroutput"><spanclass="identifier">value</span><spanclass="special">()</span></code> accessor. Note that both functions never
return null pointers - they either return a string with the relevant content,
or an empty string if name/value is absent or if the handle is null. Also
It is common to store data as text contents of some node - i.e. <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">node</span><spanclass="special">><</span><spanclass="identifier">description</span><spanclass="special">></span><spanclass="identifier">This</span>
In this case, <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">description</span><spanclass="special">></span></code> node does not have a value, but instead
has a child of type <codeclass="computeroutput"><spanclass="identifier">node_pcdata</span></code>
with value <codeclass="computeroutput"><spanclass="string">"This is a node"</span></code>.
pugixml provides <codeclass="computeroutput"><spanclass="identifier">child_value</span><spanclass="special">()</span></code> helper functions to parse such data.
</li>
<liclass="listitem">
In many cases attribute values have types that are not strings - i.e.
an attribute may always contain values that should be treated as integers,
despite the fact that they are represented as strings in XML. pugixml
provides several accessors that convert attribute value to some other
type.
</li>
</ul></div>
<p>
This is an example of using these functions (<ahref="samples/traverse_base.cpp"target="_top">samples/traverse_base.cpp</a>):
Since a lot of document traversal consists of finding the node/attribute
with the correct name, there are special functions for that purpose. For
example, <codeclass="computeroutput"><spanclass="identifier">child</span><spanclass="special">(</span><spanclass="string">"Tool"</span><spanclass="special">)</span></code>
returns the first node which has the name <codeclass="computeroutput"><spanclass="string">"Tool"</span></code>,
or null handle if there is no such node. This is an example of using such
<preclass="programlisting"><spanclass="identifier">std</span><spanclass="special">::</span><spanclass="identifier">cout</span><spanclass="special"><<</span><spanclass="string">"Tool for *.dae generation: "</span><spanclass="special"><<</span><spanclass="identifier">tools</span><spanclass="special">.</span><spanclass="identifier">find_child_by_attribute</span><spanclass="special">(</span><spanclass="string">"Tool"</span><spanclass="special">,</span><spanclass="string">"OutputFileMasks"</span><spanclass="special">,</span><spanclass="string">"*.dae"</span><spanclass="special">).</span><spanclass="identifier">attribute</span><spanclass="special">(</span><spanclass="string">"Filename"</span><spanclass="special">).</span><spanclass="identifier">value</span><spanclass="special">()</span><spanclass="special"><<</span><spanclass="string">"\n"</span><spanclass="special">;</span>
Child node lists and attribute lists are simply double-linked lists; while
you can use <codeclass="computeroutput"><spanclass="identifier">previous_sibling</span></code>/<codeclass="computeroutput"><spanclass="identifier">next_sibling</span></code> and other such functions for
iteration, pugixml additionally provides node and attribute iterators, so
that you can treat nodes as containers of other nodes or attributes. All
iterators are bidirectional and support all usual iterator operations. The
iterators are invalidated if the node/attribute objects they're pointing
to are removed from the tree; adding nodes/attributes does not invalidate
any iterators.
</p>
<p>
Here is an example of using iterators for document traversal (<ahref="samples/traverse_iter.cpp"target="_top">samples/traverse_iter.cpp</a>):
The methods described above allow traversal of immediate children of some
node; if you want to do a deep tree traversal, you'll have to do it via a
recursive function or some equivalent method. However, pugixml provides a
helper for depth-first traversal of a subtree. In order to use it, you have
to implement <codeclass="computeroutput"><spanclass="identifier">xml_tree_walker</span></code>
interface and to call <codeclass="computeroutput"><spanclass="identifier">traverse</span></code>
function.
</p>
<p>
This is an example of traversing tree hierarchy with xml_tree_walker (<ahref="samples/traverse_walker.cpp"target="_top">samples/traverse_walker.cpp</a>):
Finally, for complex queries often a higher-level DSL is needed. pugixml
provides an implementation of XPath 1.0 language for such queries. The complete
description of XPath usage can be found in the manual, but here are some
examples:
</p>
<p>
</p>
<preclass="programlisting"><spanclass="identifier">pugi</span><spanclass="special">::</span><spanclass="identifier">xpath_node_set</span><spanclass="identifier">tools</span><spanclass="special">=</span><spanclass="identifier">doc</span><spanclass="special">.</span><spanclass="identifier">select_nodes</span><spanclass="special">(</span><spanclass="string">"/Profile/Tools/Tool[@AllowRemote='true' and @DeriveCaptionFrom='lastparam']"</span><spanclass="special">);</span>
The document in pugixml is fully mutable: you can completely change the document
structure and modify the data of nodes/attributes. All functions take care
of memory management and structural integrity themselves, so they always
result in structurally valid tree - however, it is possible to create an
invalid XML tree (for example, by adding two attributes with the same name
or by setting attribute/node name to empty/invalid string). Tree modification
is optimized for performance and for memory consumption, so if you have enough
memory you can create documents from scratch with pugixml and later save
them to file/stream instead of relying on error-prone manual text writing
and without too much overhead.
</p>
<p>
All member functions that change node/attribute data or structure are non-constant
and thus can not be called on constant handles. However, you can easily convert
constant handle to non-constant one by simple assignment: <codeclass="computeroutput"><spanclass="keyword">void</span>
<spanclass="identifier">foo</span><spanclass="special">(</span><spanclass="keyword">const</span><spanclass="identifier">pugi</span><spanclass="special">::</span><spanclass="identifier">xml_node</span><spanclass="special">&</span><spanclass="identifier">n</span><spanclass="special">)</span><spanclass="special">{</span><spanclass="identifier">pugi</span><spanclass="special">::</span><spanclass="identifier">xml_node</span><spanclass="identifier">nc</span><spanclass="special">=</span><spanclass="identifier">n</span><spanclass="special">;</span><spanclass="special">}</span></code>, so const-correctness
here mainly provides additional documentation.
</p>
<p>
As discussed before, nodes can have name and value, both of which are strings.
Depending on node type, name or value may be absent. You can use <codeclass="computeroutput"><spanclass="identifier">set_name</span></code> and <codeclass="computeroutput"><spanclass="identifier">set_value</span></code>
member functions to set them. Similar functions are available for attributes;
however, the <codeclass="computeroutput"><spanclass="identifier">set_value</span></code> function
is overloaded for some other types except strings, like floating-point numbers.
Also, attribute value can be set using an assignment operator. This is an
example of setting node/attribute name and value (<ahref="samples/modify_base.cpp"target="_top">samples/modify_base.cpp</a>):
<spanclass="identifier">std</span><spanclass="special">::</span><spanclass="identifier">cout</span><spanclass="special"><<</span><spanclass="string">", new node name: "</span><spanclass="special"><<</span><spanclass="identifier">node</span><spanclass="special">.</span><spanclass="identifier">name</span><spanclass="special">()</span><spanclass="special"><<</span><spanclass="identifier">std</span><spanclass="special">::</span><spanclass="identifier">endl</span><spanclass="special">;</span>
<spanclass="identifier">std</span><spanclass="special">::</span><spanclass="identifier">cout</span><spanclass="special"><<</span><spanclass="string">", new comment text: "</span><spanclass="special"><<</span><spanclass="identifier">doc</span><spanclass="special">.</span><spanclass="identifier">last_child</span><spanclass="special">().</span><spanclass="identifier">value</span><spanclass="special">()</span><spanclass="special"><<</span><spanclass="identifier">std</span><spanclass="special">::</span><spanclass="identifier">endl</span><spanclass="special">;</span>
<spanclass="comment">// we can't change value of the element or name of the comment
<spanclass="identifier">std</span><spanclass="special">::</span><spanclass="identifier">cout</span><spanclass="special"><<</span><spanclass="string">", new attribute: "</span><spanclass="special"><<</span><spanclass="identifier">attr</span><spanclass="special">.</span><spanclass="identifier">name</span><spanclass="special">()</span><spanclass="special"><<</span><spanclass="string">"="</span><spanclass="special"><<</span><spanclass="identifier">attr</span><spanclass="special">.</span><spanclass="identifier">value</span><spanclass="special">()</span><spanclass="special"><<</span><spanclass="identifier">std</span><spanclass="special">::</span><spanclass="identifier">endl</span><spanclass="special">;</span>
<spanclass="comment">// we can use numbers or booleans
attribute() and child() functions do not add attributes or nodes to the
tree, so code like <codeclass="computeroutput"><spanclass="identifier">node</span><spanclass="special">.</span><spanclass="identifier">attribute</span><spanclass="special">(</span><spanclass="string">"id"</span><spanclass="special">)</span><spanclass="special">=</span><spanclass="number">123</span><spanclass="special">;</span></code> will not do anything if <codeclass="computeroutput"><spanclass="identifier">node</span></code> does not have an attribute with
name <codeclass="computeroutput"><spanclass="string">"id"</span></code>. Make sure
you're operating with existing attributes/nodes by adding them if necessary.
</p></td></tr>
</table></div>
<p>
This is an example of adding new attributes/nodes to the document (<ahref="samples/modify_add.cpp"target="_top">samples/modify_add.cpp</a>):
</p>
<p>
</p>
<preclass="programlisting"><spanclass="comment">// add node with some name
Often after creating a new document or loading the existing one and processing
it, it is necessary to save the result back to file. Also it is occasionally
useful to output the whole document or a subtree to some stream; use cases
include debug printing, serialization via network or other text-oriented
medium, etc. pugixml provides several functions to output any subtree of
the document to a file, stream or another generic transport interface; these
functions allow to customize the output format, and also perform necessary
encoding conversions.
</p>
<p>
The node/attribute data is written to the destination properly formatted
according to the node type; all special XML symbols, such as < and &,
are properly escaped. In order to guard against forgotten node/attribute
names, empty node/attribute names are printed as <codeclass="computeroutput"><spanclass="string">":anonymous"</span></code>.
For proper output, make sure all node and attribute names are set to meaningful
values.
</p>
<p>
If you want to save the whole document to a file, you can use the <codeclass="computeroutput"><spanclass="identifier">save_file</span></code> function, which returns <codeclass="computeroutput"><spanclass="keyword">true</span></code> on success. This is a simple example
of saving XML document to file (<ahref="samples/save_file.cpp"target="_top">samples/save_file.cpp</a>):
</p>
<p>
</p>
<preclass="programlisting"><spanclass="comment">// save document to file
For additional interoperability pugixml provides functions for saving document
to any object which implements C++ std::ostream interface. This allows you
to save documents to any standard C++ stream (i.e. file stream) or any third-party
compliant implementation (i.e. Boost Iostreams). Most notably, this allows
for easy debug output, since you can use <codeclass="computeroutput"><spanclass="identifier">std</span><spanclass="special">::</span><spanclass="identifier">cout</span></code>
stream as saving target. There are two functions, one works with narrow character
streams, another handles wide character ones.
</p>
<p>
This is a simple example of saving XML document to standard output (<ahref="samples/save_stream.cpp"target="_top">samples/save_stream.cpp</a>):
</p>
<p>
</p>
<preclass="programlisting"><spanclass="comment">// save document to standard output
All of the above saving functions are implemented in terms of writer interface.
This is a simple interface with a single function, which is called several
times during output process with chunks of document data as input. In order
to output the document via some custom transport, for example sockets, you
should create an object which implements <codeclass="computeroutput"><spanclass="identifier">xml_writer_file</span></code>
interface and pass it to <codeclass="computeroutput"><spanclass="identifier">xml_document</span><spanclass="special">::</span><spanclass="identifier">save</span></code>
function.
</p>
<p>
This is a simple example of custom writer for saving document data to STL
While the previously described functions saved the whole document to the
destination, it is easy to save a single subtree. Instead of calling <codeclass="computeroutput"><spanclass="identifier">xml_document</span><spanclass="special">::</span><spanclass="identifier">save</span></code>, just call <codeclass="computeroutput"><spanclass="identifier">xml_node</span><spanclass="special">::</span><spanclass="identifier">print</span></code>
function on the target node. You can save node contents to C++ IOstream object
or custom writer in this way. Saving a subtree slightly differs from saving
the whole document; read the manual for more information.
If you believe you've found a bug in pugixml, please file an issue via <ahref="http://code.google.com/p/pugixml/issues/entry"target="_top">issue submission form</a>.
Be sure to include the relevant information so that the bug can be reproduced:
the version of pugixml, compiler version and target architecture, the code
that uses pugixml and exhibits the bug, etc. Feature requests and contributions
can be filed as issues, too.
</p>
<aname="email"></a><p>
If filing an issue is not possible due to privacy or other concerns, you
can contact pugixml author by e-mail directly: <ahref="mailto:arseny.kapoulkine@gmail.com"target="_top">arseny.kapoulkine@gmail.com</a>.