Go to file

Arseny Kapoulkine c55ea3bc1e XPath: Make remove_duplicates generate stable order Given an unsorted sequence, remove_duplicates would sort it using the pointer value of attributes/nodes and then remove consecutive duplicates. This was problematic because it meant that the result of XPath queries was dependent on the memory allocation pattern. While it's technically incorrect to rely on the order, this results in easy to miss bugs. This is particularly common when XPath queries use union operators - although we also will call remove_duplicates in other cases. This change reworks the code to use a hash set instead, using the same hash function we use for compact storage. To make sure it performs well, we allocate enough buckets for count * 1.5 (assuming all elements are unique); since each bucket is a single pointer unlike xpath_node which is two pointers, we need somewhere between size * 0.75 and size * 1.5 temporary storage. The resulting filtering is stable - we remove elements that we have seen before but we don't change the order - and is actually significantly faster than sorting was. With a large union operation, before this change it took ~56 ms per 100 query invocations to remove duplicates, and after this change it takes ~20ms. Fixes #254.		2019-02-26 23:57:58 -08:00
contrib	Visual Studio Natvis visualization (#227 )	2018-08-07 17:03:37 -07:00
docs	docs: Improve null node comparison wording	2019-01-25 16:42:25 -08:00
scripts	Happy New Year!	2019-01-01 23:05:04 +03:00
src	XPath: Make remove_duplicates generate stable order	2019-02-26 23:57:58 -08:00
tests	XPath: Make remove_duplicates generate stable order	2019-02-26 23:57:58 -08:00
.codecov.yml	Add .codecov.yml to disable PR comments	2016-08-08 08:23:42 -07:00
.gitattributes	Add .gitattributes file	2018-07-23 23:13:02 -07:00
.gitignore	Update .gitignore	2017-06-20 21:11:35 -07:00
.travis.yml	Move unreachable line handling to Makefile	2018-12-10 11:12:13 -08:00
appveyor.yml	Add PUGIXML_WCHAR_MODE configuration to MinGW tests	2018-12-09 17:20:02 -08:00
CMakeLists.txt	Refactor CMakeLists.txt support for multiple targets	2019-02-06 08:15:27 -08:00
LICENSE.md	Happy New Year!	2019-01-01 23:05:04 +03:00
Makefile	Move unreachable line handling to Makefile	2018-12-10 11:12:13 -08:00
README.md	Update README.md	2018-04-12 10:07:26 -07:00
readme.txt	Happy New Year!	2019-01-01 23:05:04 +03:00

README.md

pugixml

pugixml is a C++ XML processing library, which consists of a DOM-like interface with rich traversal/modification capabilities, an extremely fast XML parser which constructs the DOM tree from an XML file/buffer, and an XPath 1.0 implementation for complex data-driven tree queries. Full Unicode support is also available, with Unicode interface variants and conversions between different Unicode encodings (which happen automatically during parsing/saving).

pugixml is used by a lot of projects, both open-source and proprietary, for performance and easy-to-use interface.

Documentation

Documentation for the current release of pugixml is available on-line as two separate documents:

Quick-start guide, that aims to provide enough information to start using the library;
Complete reference manual, that describes all features of the library in detail.

You’re advised to start with the quick-start guide; however, many important library features are either not described in it at all or only mentioned briefly; if you require more information you should read the complete manual.

Example

Here's an example of how code using pugixml looks; it opens an XML file, goes over all Tool nodes and prints tools that have a Timeout attribute greater than 0:

#include "pugixml.hpp"
#include <iostream>

int main()
{
    pugi::xml_document doc;
    pugi::xml_parse_result result = doc.load_file("xgconsole.xml");
    if (!result)
        return -1;
        
    for (pugi::xml_node tool: doc.child("Profile").child("Tools").children("Tool"))
    {
        int timeout = tool.attribute("Timeout").as_int();
        
        if (timeout > 0)
            std::cout << "Tool " << tool.attribute("Filename").value() << " has timeout " << timeout << "\n";
    }
}

And the same example using XPath:

#include "pugixml.hpp"
#include <iostream>

int main()
{
    pugi::xml_document doc;
    pugi::xml_parse_result result = doc.load_file("xgconsole.xml");
    if (!result)
        return -1;
        
    pugi::xpath_node_set tools_with_timeout = doc.select_nodes("/Profile/Tools/Tool[@Timeout > 0]");
    
    for (pugi::xpath_node node: tools_with_timeout)
    {
        pugi::xml_node tool = node.node();
        std::cout << "Tool " << tool.attribute("Filename").value() <<
            " has timeout " << tool.attribute("Timeout").as_int() << "\n";
    }
}

License

This library is available to anybody free of charge, under the terms of MIT License (see LICENSE.md).

README.md Unescape Escape

pugixml

Documentation

Example

License

README.md