-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add FAILOVER_PREFER_REPLICA cluster mode #2734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Add FAILOVER_PREFER_REPLICA cluster mode #2734
Conversation
Introduces a new failover mode for RedisCluster that prioritizes reading
from replica nodes while maintaining high availability through automatic
master fallback.
Changes:
- Add REDIS_FAILOVER_PREFER_REPLICA constant (value: 4)
- Implement replica-first logic in cluster_sock_write() with master fallback
- Update validation in setOption() to accept new failover mode
- Add RedisCluster::FAILOVER_PREFER_REPLICA PHP constant
- Regenerate arginfo files for both PHP 8.0+ and legacy versions
- Update cluster.md with usage example
- Add comprehensive test coverage in RedisClusterTest
Behavior:
When enabled, read commands are directed to replica nodes first. If no
replicas are available for a given shard, the command automatically falls
back to the master node, ensuring availability.
Usage:
$cluster->setOption(
RedisCluster::OPT_SLAVE_FAILOVER,
RedisCluster::FAILOVER_PREFER_REPLICA
);
|
The main motivation for this change is to improve availability under failure scenarios. In our workload, if all replicas in a shard fail, it’s preferable to continue serving reads from the master node rather than blocking all read operations entirely. This PR introduces a new failover mode (FAILOVER_PREFER_REPLICA) that implements a “replica-first, master-fallback” strategy. It ensures that read commands are routed to replicas when available, but automatically fall back to the shard’s master if no replicas are reachable, maintaining both redundancy and read availability without manual intervention. |
|
@michael-grunder something I have to change here? Maybe it is a KeyDB configuration issue? |
|
Possibly. The fact that the tests are exclusively failing against I'd try to determine why KeyDB failes by isolatinig the root cause locally. I will also play around with it. Ideally it's somethinig that could be tweaked in the client so |
|
I replicated the You can use the It's a bit tricky because wait takes a number of replicas and a millisecond timeout. I just went with A more robust solution might be to detect the number of replicas at runtime and use that number instead of a hardcoded one. |
Use WAIT command after write operations to ensure keys have propagated to replicas before testing FAILOVER_PREFER_REPLICA mode. This prevents test failures due to replication lag, particularly with KeyDB. The WAIT command waits for at least 1 replica (matching the CI cluster configuration) with a 1 second timeout.
|
I've added the WAITs, hopefully this helps. |
|
OK still failing but in an inexplicable place and with the failover tests. The I'll track it down this weekend. Edit: On my machine |
|
@inakisoriamrf For now we should just skip your failover test against It's just a matter of putting this at the start of your test method: if ( ! $this->is_keydb)
$this->markTestSkipped(); |
|
@michael-grunder I forgot to mention, it’s already being skipped following your suggestion. |
|
Yep, I saw that thanks. We'll merge this after we release 6.3.0 GA and get it iinto 6.3.1. It should have some time to simmer. |
Introduces a new failover mode for RedisCluster that prioritizes reading from replica nodes while maintaining high availability through automatic master fallback.
Changes:
Behavior:
When enabled, read commands are directed to replica nodes first. If no replicas are available for a given shard, the command automatically falls back to the master node, ensuring availability.
Usage:
$cluster->setOption(
RedisCluster::OPT_SLAVE_FAILOVER,
RedisCluster::FAILOVER_PREFER_REPLICA
);