Hash table tradeoffs: CPU, memory, and variability

Factor 1: non-memory-related CPU cost of a key search or an insertion

This is how much CPU a key search (or an insertion) takes on average, excluding the CPU cost of random accesses to the hash table’s memory. This factor is not directly measurable. It can be best estimated when benchmarking hash tables with very few entries so that all random accesses to the hash table’s memory hit the L1 cache.

Factor 2: memory

The memory factor consists of several subfactors which, however, are deeply interwoven, so it’s usually not possible to consider these subfactors in separation.

Linear memory footprint by tiers

O(N) memory footprint of a hash table (where N is the number of entries stored in the hash table) is comprised of one or several “tiers” with different constant factors and, sometimes, different variability. For example, open addressing hash tables often have a single memory tier: the table itself which is doubled when the size exceeds a certain limit, determined by the maximum load factor.

The number of locations accessed in each memory tier during a key search or an insertion

Extending the example from the previous section, during a key search in an open addressing hash table we begin with reading from some slot in the table. If the slot appears to be empty it means that the key is absent in the hash table and we finish the search. If the slot is full and the key stored in the slot is equal to the searched key, then the search is successful and we are done, too. Finally, if the slot is full but the key stored in the slot is not equal to the searched key then we try another slot according to the collision resolution scheme, whether it is linear probing, quadratic probing, Cuckoo hashing, etc. The whole process is repeated for the second slot which may result in accessing the third, fourth, etc. slots with diminishing probability.

The relative locality of random accesses to the memory tiers

When random accesses to hash table’s memory exhibit some locality (meaning, in fact, that some of those accesses are not fully random w.r.t. the other accesses) the CPU cache (both data cache and TLB) might be utilized better and the “secondary” accesses might complete faster.

  • Accesses within a single cache line (64 bytes).
  • Accesses within a single aligned block of 128 bytes: taking advantage of the Spatial Prefetcher (see Intel’s optimization manual, § 2.4.5.4).
  • Accesses within a single page (e. g. 4KB or 2MB) for better TLB utilization.

Other memory-related subfactors

The memory footprint and the number of accessed memory locations subfactors in the hash table’s tradeoff space are similar to space amplification and read amplification, respectively, used in the analysis of durable index structure algorithms.

Factor 3: variability

Efficiency characteristics of all hash table algorithms in the context of the CPU and the memory factors described above change depending on the specific hash table’s population (the number of stored entries) which translates to some load factor in most hash table algorithms. It may be impossible to clearly define load factor for some complicated, multi-tier hash table algorithms, but their “relative fullness” and, therefore, the efficiency characteristics vary too.

  • Variability in memory utilization.
  • Variability in the average number of collision resolution steps during hash table operations. Note this is not the variance of the discrete probability distribution characterizing the number of accessed locations described above. Rather, this is the swing of the average itself for a given hash table algorithm/data structure in different states. The variability in collision resolution steps translates into the variability in the average number of accessed memory locations and in the non-memory-related CPU cost.
  • Quality of hash function. Open addressing hash tables with “local” collision resolution: namely, linear and quadratic probing are the most susceptible to the hash function quality.
  • Entry size affects the variability of memory utilization in hash tables that store entries in separately allocated nodes (like hash tables with collision resolution via separate chaining, or open addressing hash tables with forwarding pointers): the larger the entry size, the less variable the memory utilization of such data structures becomes (because of the diminishing relative weight of the variably sized memory tier, the table). See also a discussion of the tradeoff between inline entries and separately allocated nodes below in this post.

External factors

Putting everything together, the efficiency and the behavioral model of a hash table is influenced by the algorithm factors described above: CPU cost, memory, and variability, as well as the following factors which are independent of the hash table algorithm:

  1. The types and the relative frequency of operations on the hash table in the workload, such as read-only (successful key searches), read-only (unsuccessful key searches), 50/50 read/write, etc.
  2. The range of the possible hash table’s sizes.
  3. The distribution of key access frequencies. Most benchmarks assume either uniform or Zipf distribution.
  4. The hash function.
  5. The entry size.
  6. The fraction of CPU’s caches capacity storing the hash table’s memory lines. This depends on whether the operations on the hash table are on the application’s hot spot.
  7. The sizes of the CPU caches: L1/L2/L3 and TLB on the machine where the application is deployed.
  8. The CPU architecture: the availability of some instructions, for example, _mm_cmpeq_epi8 and _mm_movemask_epi8 used in SwissTable and F14.

Examples of tradeoffs between the factors

Low vs. high maximum load factor

Setting the maximum load factor of an open addressing hash table to a relatively low value (e. g. 0.5) reduces the average number of collision resolution steps and its variability. The difference in the average number of collision resolution steps during an operation on hash tables with load factors 0.5 and 0.5/X (where X is the growth factor, typically 2) is smaller than the difference between hash tables with load factors L and L/X where L is any load factor greater than 0.5. This is because the dependency of the number of collision resolution steps during an operation on a hash table from the load factor is a convex function.

Linear vs. quadratic probing vs. double hashing

In open addressing hash tables, quadratic probing reduces the variability of the average number of collision resolution steps (compared to linear probing) but also increases the non-memory related CPU cost of operations a bit and reduces the relative locality of the memory accesses. Compared to quadratic probing, double hashing (as well as cuckoo hashing) reduces the variability in the number of collision resolution steps even more, but incurs significantly higher non-memory-related CPU costs of operations and don’t impose any relative locality between the random accesses to the table.

Quadratic probing vs. Robin Hood hashing

By construction, the variant of Robin Hood hashing which is based on linear probing aims to reduce the variability of the average number of collision resolution steps on successful key searches at the price of higher non-memory-related CPU cost of insertions. The tradeoff between quadratic probing and Robin Hood hashing is that the non-memory-related CPU cost of successful key searches with Robin Hood hashing is lower than with quadratic probing, and the accessed memory locations (if multiple) are adjacent, as with linear probing.

Inline entries vs. separately allocated nodes

Open addressing hash tables that store the key structures (or the values, or both, that is, the whole entries) inline have one less memory tier than equivalent hash tables with a level of pointer indirection. Usually, the latter variant is chosen when the pointer stability of the keys (or the values) in the table is required (pointers to keys or values inside an “inline” table become invalid during a rehash). This factor is outside of the tradeoff space discussed in this post. However, “inline” vs. “forwarding” hash tables fare differently on the tradeoff factors as well: “inline” tables have higher variability in memory utilization and accesses with less locality to their sole memory tier (when using a collision resolution scheme that preserves some locality, such as linear or quadratic probing), whereas “forwarding” tables have higher minimum memory footprint and impose more memory accesses because there is an extra memory tier.

Open addressing vs. separate chaining

To isolate from the “inline vs. forwarding” tradeoff discussed in the previous section, let’s consider hash tables that resolve collisions via separate chaining and store all their entries in separately allocated nodes (rather than the hybrid with the first level of entries inlined) and open addressing hash tables with forwarding pointers, i. e. with separately allocated nodes, too.

SwissTable vs. F14

SwissTable and F14 are both open addressing hash tables. They have a memory tier with 8 bits per slot (tags) of which 7 bits are taken from the hash code of the key stored in the corresponding slot. They use fancy vector CPU instructions to check 16 slots at a time (14 or 12 slots in F14). They also both use quadratic probing for collision resolution between the “buckets” of slots checked at once.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Roman Leventov

Roman Leventov

Writing about systems, technology, philosophy.