Go to file
Arseny Kapoulkine c55ea3bc1e XPath: Make remove_duplicates generate stable order
Given an unsorted sequence, remove_duplicates would sort it using the
pointer value of attributes/nodes and then remove consecutive
duplicates.

This was problematic because it meant that the result of XPath queries
was dependent on the memory allocation pattern. While it's technically
incorrect to rely on the order, this results in easy to miss bugs.

This is particularly common when XPath queries use union operators -
although we also will call remove_duplicates in other cases.

This change reworks the code to use a hash set instead, using the same
hash function we use for compact storage. To make sure it performs well,
we allocate enough buckets for count * 1.5 (assuming all elements are
unique); since each bucket is a single pointer unlike xpath_node which
is two pointers, we need somewhere between size * 0.75 and size * 1.5
temporary storage.

The resulting filtering is stable - we remove elements that we have seen
before but we don't change the order - and is actually significantly
faster than sorting was.

With a large union operation, before this change it took ~56 ms per 100
query invocations to remove duplicates, and after this change it takes
~20ms.

Fixes #254.
2019-02-26 23:57:58 -08:00
contrib Visual Studio Natvis visualization (#227) 2018-08-07 17:03:37 -07:00
docs docs: Improve null node comparison wording 2019-01-25 16:42:25 -08:00
scripts Happy New Year! 2019-01-01 23:05:04 +03:00
src XPath: Make remove_duplicates generate stable order 2019-02-26 23:57:58 -08:00
tests XPath: Make remove_duplicates generate stable order 2019-02-26 23:57:58 -08:00
.codecov.yml Add .codecov.yml to disable PR comments 2016-08-08 08:23:42 -07:00
.gitattributes Add .gitattributes file 2018-07-23 23:13:02 -07:00
.gitignore Update .gitignore 2017-06-20 21:11:35 -07:00
.travis.yml Move unreachable line handling to Makefile 2018-12-10 11:12:13 -08:00
appveyor.yml Add PUGIXML_WCHAR_MODE configuration to MinGW tests 2018-12-09 17:20:02 -08:00
CMakeLists.txt Refactor CMakeLists.txt support for multiple targets 2019-02-06 08:15:27 -08:00
LICENSE.md Happy New Year! 2019-01-01 23:05:04 +03:00
Makefile Move unreachable line handling to Makefile 2018-12-10 11:12:13 -08:00
README.md Update README.md 2018-04-12 10:07:26 -07:00
readme.txt Happy New Year! 2019-01-01 23:05:04 +03:00

pugixml Build Status Build status codecov.io MIT

pugixml is a C++ XML processing library, which consists of a DOM-like interface with rich traversal/modification capabilities, an extremely fast XML parser which constructs the DOM tree from an XML file/buffer, and an XPath 1.0 implementation for complex data-driven tree queries. Full Unicode support is also available, with Unicode interface variants and conversions between different Unicode encodings (which happen automatically during parsing/saving).

pugixml is used by a lot of projects, both open-source and proprietary, for performance and easy-to-use interface.

Documentation

Documentation for the current release of pugixml is available on-line as two separate documents:

Youre advised to start with the quick-start guide; however, many important library features are either not described in it at all or only mentioned briefly; if you require more information you should read the complete manual.

Example

Here's an example of how code using pugixml looks; it opens an XML file, goes over all Tool nodes and prints tools that have a Timeout attribute greater than 0:

#include "pugixml.hpp"
#include <iostream>

int main()
{
    pugi::xml_document doc;
    pugi::xml_parse_result result = doc.load_file("xgconsole.xml");
    if (!result)
        return -1;
        
    for (pugi::xml_node tool: doc.child("Profile").child("Tools").children("Tool"))
    {
        int timeout = tool.attribute("Timeout").as_int();
        
        if (timeout > 0)
            std::cout << "Tool " << tool.attribute("Filename").value() << " has timeout " << timeout << "\n";
    }
}

And the same example using XPath:

#include "pugixml.hpp"
#include <iostream>

int main()
{
    pugi::xml_document doc;
    pugi::xml_parse_result result = doc.load_file("xgconsole.xml");
    if (!result)
        return -1;
        
    pugi::xpath_node_set tools_with_timeout = doc.select_nodes("/Profile/Tools/Tool[@Timeout > 0]");
    
    for (pugi::xpath_node node: tools_with_timeout)
    {
        pugi::xml_node tool = node.node();
        std::cout << "Tool " << tool.attribute("Filename").value() <<
            " has timeout " << tool.attribute("Timeout").as_int() << "\n";
    }
}

License

This library is available to anybody free of charge, under the terms of MIT License (see LICENSE.md).