mindspore/third_party/robin_hood
He Wei 41dcac9c49 Replace std::unordered_map/set with robin-hood-hashing
Robin-hood-hashing (https://github.com/martinus/robin-hood-hashing)
is considered faster then std::unordered_map/set,
so we use it to improve mindspore performance.

1. robin_hood head file in `third_party/robin_hood/include`;
2. In `utils/hash_map.h` and `utils/hash_set.h`, we define:
 - mindspore::HashMap as an alias of robin_hood::unordered_map;
 - mindspore::HashSet as an alias of robin_hood::unordered_set;
3. Replace:
 - `#include <unordered_map>` --> `#include "utils/hash_map.h"`;
 - `#include <unordered_set>` --> `#include "utils/hash_set.h"`;
 - `std::unordered_map` --> `mindspore::HashMap`;
 - `std::unordered_set` --> `mindspore::HashSet`;
 - `map.insert(std::pair(key, value))` --> `map.emplace(key, value)`;
 - `[] (const std::pair<K, V> &p) {..} ` --> `[] (const auto &p) {..} `;
4. Fix issues found by switch to robin_hood:
 - AnfNodeConfig hash and equal;
 - Fix a bug in `Slice::operator==()`;
 - Fix a bug in `CNode::HasPrimalAttr()`;
 - Fix map.erase() usage bugs: `map.erase(iter++)` --> `iter = map.erase(iter)`;
 - Fix some iterator invalidated problem;
5. Some std::unordered_map/set can not replace by robin_hood:
 - As parameter of functions that exposed to python by pybind11;
 - Use bad hash that cause robin_hood::map over_flow, such as AbstractBasePtrListHasher;
6. Update cpp unit tests;
7. Add build option '-F' to enable robin_hood, default on.
2021-11-24 10:47:40 +08:00
..
include/robin_hood Replace std::unordered_map/set with robin-hood-hashing 2021-11-24 10:47:40 +08:00
LICENSE Replace std::unordered_map/set with robin-hood-hashing 2021-11-24 10:47:40 +08:00
README.md Replace std::unordered_map/set with robin-hood-hashing 2021-11-24 10:47:40 +08:00

README.md

➵ robin_hood unordered map & set Release GitHub license

Travis CI Build Status Appveyor Build Status Codacy Badge Total alerts Language grade: C/C++ Coverage Status

robin_hood::unordered_map and robin_hood::unordered_set is a platform independent replacement for std::unordered_map / std::unordered_set which is both faster and more memory efficient for real-world use cases.

Installation & Usage

Direct Inclusion

  1. Add robin_hood.h to your C++ project.
  2. Use robin_hood::unordered_map instead of std::unordered_map
  3. Use robin_hood::unordered_set instead of std::unordered_set

Conan, the C/C++ Package Manager

  1. Setup your CMakeLists.txt (see Conan documentation on how to use MSBuild, Meson and others) like this:
    project(myproject CXX)
    
    add_executable(${PROJECT_NAME} main.cpp)
    
    include(${CMAKE_BINARY_DIR}/conanbuildinfo.cmake) # Include Conan-generated file
    conan_basic_setup(TARGETS) # Introduce Conan-generated targets
    
    target_link_libraries(${PROJECT_NAME} CONAN_PKG::robin-hood-hashing)
    
  2. Create conanfile.txt in your source dir (don't forget to update the version)
    [requires]
    robin-hood-hashing/3.10.0
    
    [generators]
    cmake
    
  3. Install and run Conan, then build your project as always:
    pip install conan
    mkdir build
    cd build
    conan install ../ --build=missing
    cmake ../
    cmake --build .
    
    The robin-hood-hashing package in Conan is kept up to date by Conan contributors. If the version is out of date, please create an issue or pull request on the conan-center-index repository.

Benchmarks

Please see extensive benchmarks at Hashmaps Benchmarks. In short: robin_hood is always among the fastest maps and uses far less memory than std::unordered_map.

Design Choices

  • Two memory layouts. Data is either stored in a flat array, or with node indirection. Access for unordered_flat_map is extremely fast due to no indirection, but references to elements are not stable. It also causes allocation spikes when the map resizes, and will need plenty of memory for large objects. Node based map has stable references & pointers (NOT iterators! Similar to std::unordered_map) and uses const Key in the pair. It is a bit slower due to indirection. The choice is yours; you can either use robin_hood::unordered_flat_map or robin_hood::unordered_node_map directly. If you use robin_hood::unordered_map It tries to choose the layout that seems appropriate for your data.

  • Custom allocator. Node based representation has a custom bulk allocator that tries to make few memory allocations. All allocated memory is reused, so there won't be any allocation spikes. It's very fast as well.

  • Optimized hash. robin_hood::hash has custom implementations for integer types and for std::string that are very fast and falls back to std::hash for everything else.

  • Depends on good Hashing. For a really bad hash the performance will not only degrade like in std::unordered_map, the map will simply fail with an std::overflow_error. In practice, when using the standard robin_hood::hash, I have never seen this happening.

License

Licensed under the MIT License. See the LICENSE file for details.

by martinus