Add FAILOVER_PREFER_REPLICA cluster mode #2734

inakisoriamrf · 2025-10-16T10:05:03Z

Introduces a new failover mode for RedisCluster that prioritizes reading from replica nodes while maintaining high availability through automatic master fallback.

Changes:

Add REDIS_FAILOVER_PREFER_REPLICA constant (value: 4)
Implement replica-first logic in cluster_sock_write() with master fallback
Update validation in setOption() to accept new failover mode
Add RedisCluster::FAILOVER_PREFER_REPLICA PHP constant
Update cluster.md with usage example
Add some test coverage in RedisClusterTest

Behavior:
When enabled, read commands are directed to replica nodes first. If no replicas are available for a given shard, the command automatically falls back to the master node, ensuring availability.

Usage:
$cluster->setOption(
RedisCluster::OPT_SLAVE_FAILOVER,
RedisCluster::FAILOVER_PREFER_REPLICA
);

Introduces a new failover mode for RedisCluster that prioritizes reading from replica nodes while maintaining high availability through automatic master fallback. Changes: - Add REDIS_FAILOVER_PREFER_REPLICA constant (value: 4) - Implement replica-first logic in cluster_sock_write() with master fallback - Update validation in setOption() to accept new failover mode - Add RedisCluster::FAILOVER_PREFER_REPLICA PHP constant - Regenerate arginfo files for both PHP 8.0+ and legacy versions - Update cluster.md with usage example - Add comprehensive test coverage in RedisClusterTest Behavior: When enabled, read commands are directed to replica nodes first. If no replicas are available for a given shard, the command automatically falls back to the master node, ensuring availability. Usage: $cluster->setOption( RedisCluster::OPT_SLAVE_FAILOVER, RedisCluster::FAILOVER_PREFER_REPLICA );

inakisoriamrf · 2025-10-16T10:08:40Z

The main motivation for this change is to improve availability under failure scenarios. In our workload, if all replicas in a shard fail, it’s preferable to continue serving reads from the master node rather than blocking all read operations entirely.

This PR introduces a new failover mode (FAILOVER_PREFER_REPLICA) that implements a “replica-first, master-fallback” strategy. It ensures that read commands are routed to replicas when available, but automatically fall back to the shard’s master if no replicas are reachable, maintaining both redundancy and read availability without manual intervention.

inakisoriamrf · 2025-10-22T13:43:11Z

@michael-grunder something I have to change here? Maybe it is a KeyDB configuration issue?
I see there are a lot of tests Skpped for KeyDB

michael-grunder · 2025-10-22T16:37:23Z

Possibly. The fact that the tests are exclusively failing against KeyDB may indicate the problem is serverside.

I'd try to determine why KeyDB failes by isolatinig the root cause locally. I will also play around with it.

Ideally it's somethinig that could be tweaked in the client so KeyDB gains access to the feature too, but if it's not solveable client-side we could skip KeyDB tests and document the situation.

michael-grunder · 2025-10-22T18:21:42Z

I replicated the KeyDB failures locally and it just had to do with replication delay.

You can use the WAIT command via rawCommand to make sure the key has had time to propagate to the replica before testing failover.

It's a bit tricky because wait takes a number of replicas and a millisecond timeout. I just went with 1 since we set up a cluster with 1 replica per node in CI. It may mean however, that tests fail locally for people who run them against a cluster with many more replicas.

A more robust solution might be to detect the number of replicas at runtime and use that number instead of a hardcoded one.

Use WAIT command after write operations to ensure keys have propagated to replicas before testing FAILOVER_PREFER_REPLICA mode. This prevents test failures due to replication lag, particularly with KeyDB. The WAIT command waits for at least 1 replica (matching the CI cluster configuration) with a 1 second timeout.

inakisoriamrf · 2025-10-24T07:30:45Z

I've added the WAITs, hopefully this helps.

michael-grunder · 2025-10-24T17:11:58Z

OK still failing but in an inexplicable place and with the failover tests.

The WAIT problem was 100% the cause of the previous failures, so I'm not sure what's going on. These tests pass every time for me locally so it must be a classic GitHub 🥔 CI issue.

I'll track it down this weekend.

Edit: On my machine keys is crashing or causing KeyDB to exit 😄

tests/RedisClusterTest.php

michael-grunder · 2025-10-26T18:01:54Z

@inakisoriamrf For now we should just skip your failover test against KeyDB. Once they patch it we can re-enable.

It's just a matter of putting this at the start of your test method:

        if ( ! $this->is_keydb)
            $this->markTestSkipped();

inakisoriamrf · 2025-11-04T14:36:38Z

@michael-grunder I forgot to mention, it’s already being skipped following your suggestion.

michael-grunder · 2025-11-04T17:28:32Z

Yep, I saw that thanks. We'll merge this after we release 6.3.0 GA and get it iinto 6.3.1.

It should have some time to simmer.

michael-grunder self-assigned this Oct 20, 2025

Merge branch 'develop' into feature/failover-prefer-replica

073d144

yatsukhnenko reviewed Oct 24, 2025

View reviewed changes

tests/RedisClusterTest.php Show resolved Hide resolved

Skipping KeyDB because of CI

622c9fc

michael-grunder added this to the 6.3.1 milestone Nov 4, 2025

michael-grunder added improvement cluster labels Nov 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add FAILOVER_PREFER_REPLICA cluster mode #2734

Add FAILOVER_PREFER_REPLICA cluster mode #2734

inakisoriamrf commented Oct 16, 2025

Uh oh!

inakisoriamrf commented Oct 16, 2025

Uh oh!

inakisoriamrf commented Oct 22, 2025

Uh oh!

michael-grunder commented Oct 22, 2025

Uh oh!

michael-grunder commented Oct 22, 2025

Uh oh!

inakisoriamrf commented Oct 24, 2025

Uh oh!

michael-grunder commented Oct 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

michael-grunder commented Oct 26, 2025

Uh oh!

inakisoriamrf commented Nov 4, 2025

Uh oh!

michael-grunder commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Add FAILOVER_PREFER_REPLICA cluster mode #2734

Are you sure you want to change the base?

Add FAILOVER_PREFER_REPLICA cluster mode #2734

Conversation

inakisoriamrf commented Oct 16, 2025

Uh oh!

inakisoriamrf commented Oct 16, 2025

Uh oh!

inakisoriamrf commented Oct 22, 2025

Uh oh!

michael-grunder commented Oct 22, 2025

Uh oh!

michael-grunder commented Oct 22, 2025

Uh oh!

inakisoriamrf commented Oct 24, 2025

Uh oh!

michael-grunder commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

michael-grunder commented Oct 26, 2025

Uh oh!

inakisoriamrf commented Nov 4, 2025

Uh oh!

michael-grunder commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

michael-grunder commented Oct 24, 2025 •

edited

Loading