Uploaded image for project: 'CFEngine Community'
  1. CFEngine Community
  2. CFE-2301

Hash table hash_map.c has a fixed (8192) number of buckets

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Done
    • Priority: High
    • Resolution: Fixed
    • Affects Version/s: 3.10.7
    • Fix Version/s: 3.12.0
    • Component/s: libutils
    • Labels:
      None

      Description

      Our hash table (found in hash_map.c in libutils) is of fixed size 8192. So basically this is not a proper hash table. Is also has some more disadvantages. This task is about refactoring it.

      • Make our hash table number of buckets auto-expanding (and maybe contracting) according to the current load factor. Careful here:
        • Power-of-two total bucket count is faster, but requires very good hash function
        • Prime number total bucket count is more common and safer choice (avoids collisions even with trivial hash functions), but requires more careful and slower max-bounding of the hash function result (will not work without fixing the secondary issue mentioned later)
      • Change the starting bucket count to a number smaller than 8192, since it is so commonly used this should save memory.
      • Run map_test.c before and after your changes, update it as needed and write whatever new tests you want

      Relevant, but secondary and somewhat more tricky issue:

      • Currently all our hash functions have an argument for "max" bounding, according to the number of buckets. This does not work currently for non-power-of-2 max values, and it's duplicated code all over the place.
        • So max-bounding should be moved to a function of its own, called by hash_map.c to limit the hash_fn() result. And if a non-power-of-2 total bucket number is used, the max-bounding function should work on that.
        • the hash functions used for the various hash tables instantiated are all over the code base and passed to the hash table as hash_fn; However you can the find them via grep'ing for TYPED_MAP_DEFINE, in general the mostly used hash function is StringHash(),

      Any other improvements you think should be done? Either fix or note them down as TODOs.

        Attachments

          Activity

            People

            Assignee:
            vpodzime Vratislav Podzimek
            Reporter:
            a10038 jimis (Dimitrios Apostolou)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: