Here is a fun bug.
In a simulation/test we had 10 nodes added like this:
$hasher->addTargets(['node1:port1', ... 'node10:port10']);
We also set num replicas to 100 but you can reproduce with 64.
Then we tested that 1000 keys looked up a roughly evenly distributed (pass), and that if we remove a node (node4:port4
in our case) they remain even which works fine. We also test that each node retains all of the keys it had originally after one node is removed (as well as something close to 1/N-1 of the 1/N keys the failed host had in the original ring).
This part fails - somehow removing a node (or strictly building same ring without node4 but they should be equivalent), makes some small percentage of keys move which node they are on even if they weren't on the failed node.
After some debugging, I think I can explain this.
When using MD5Hasher
you chose to keep hex strings at the ring positions rather than converting back to ints. That in itself is not necessarily wrong, however it falls into a PHP dynamic typing gotcha: the internal positionToTarget
lookup ends up looking like this (after sorting):
...
'ff4a2aff' => 'host7:port7',
'ff6c8fa3' => 'host5:port5',
'ff820a72' => 'host5:port5',
'ffc8b06f' => 'host6:port6',
'fff31063' => 'host10:port10',
11023620 => 'host4:port4',
13715227 => 'host7:port7',
13952778 => 'host4:port4',
16385505 => 'host6:port6',
17878142 => 'host2:port2',
...
i.e. using an all-decimal-digit string as an array key in PHP causes PHP to convert it to an int key which then sorts after all string keys not in natural hex-digit order. Keys that are in these ring segments then move unpredictably between ring reconfigurations because the algorithm is no longer correct and comparisons are not working as expected.
Switching to CRC32Hasher
with identical params fixes the issue because those are always integer keys.
2 possible fixes:
- You //could// do some magic in
Flexihash
to make this work correctly - it's not really MD5Hasher
's fault PHP does funny things with "int-like" strings. For example you could prefix string keys in the array so they look like _ffaabb33
, the leading underscore would stop PHP mangling to an int and would keep sort order correct. You'd obviously need to also prepend the hashed key with _
at lookup time etc.
- Alternatively, you could decide that
HasherInterface
requires returned values to be ints and no larger than PHP_INT_MAX
and fix MD5Hasher
to meet this - just need hexdec()
around the return statement.
Please let us know which you'd prefer both are simple changes.