-
Notifications
You must be signed in to change notification settings - Fork 136
Allow users to configure the Postgres pool_size and max_overflow property values to improve performance #917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to configure the Postgres pool_size and max_overflow property values to improve performance #917
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR exposes configuration options for PostgreSQL connection pooling and Neo4j index creation thresholds to improve performance under load. The changes allow users to tune database behavior without modifying defaults.
Key Changes:
- Added
pool_sizeandmax_overflowconfiguration fields for PostgreSQL connection pooling - Added
range_index_creation_thresholdandvector_index_creation_thresholdconfiguration fields for Neo4j indexing - Updated sample configuration files to demonstrate the new optional settings
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/memmachine/common/configuration/database_conf.py | Adds optional configuration fields for PostgreSQL pool settings and Neo4j index thresholds |
| src/memmachine/common/resource_manager/database_manager.py | Conditionally applies the new configuration values when creating database engines and graph stores |
| sample_configs/episodic_memory_config.gpu.sample | Demonstrates the new configuration options with example values |
| sample_configs/episodic_memory_config.cpu.sample | Demonstrates the new configuration options with example values |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
56a28cd to
8add149
Compare
…erty values to improve performance Allows users to configure the Neo4 j range_index_creation_threshold and vector_index_creation_threshold property values to improve performance Signed-off-by: Steve Scargall <[email protected]>
8add149 to
cda6b58
Compare
…erty values to improve performance (#917) Allows users to configure the Neo4j range_index_creation_threshold and vector_index_creation_threshold property values to improve performance Signed-off-by: Steve Scargall <[email protected]>
…erty values to improve performance (MemMachine#917) Allows users to configure the Neo4j range_index_creation_threshold and vector_index_creation_threshold property values to improve performance Signed-off-by: Steve Scargall <[email protected]>
Purpose of the change
This is Part 2 of n of my performance investigation into MemMachine.
To help users tune the Postgres and Neo4j Databases to handle more load when using our Docker Compose environment, this PR:
Description
The out-of-the-box settings for Postgres and Neo4j are not 100% suitable for MemMachine, especially when we enable FastAPI workers to handle more inbound traffic - See #903. When we increase the number of workers and inbound connections, the Postgres connection pool is set to 5, but we need more.
Similarly, in Neo4j, indexing occurs only every 10,000 nodes. This causes significant ingestion delays and O(n) lookup times until the indexes are created, as Neo4j does full table scans for each new insertion.
PostgreSQL Connection Pool Properties
pool_sizesets the fixed number of persistent database connections maintained in the pool for reuse, minimizing the expensive overhead of creating new connections for each FastAPI request.max_overflowdefines the maximum number of temporary extra connections allowed beyondpool_sizeduring traffic spikes, enabling the pool to expand dynamically without rejecting requests.Why Configure for Load Testing
During load tests, high concurrency from multiple FastAPI worker processes can exhaust fixed pools, causing stalls as workers queue indefinitely for connections and reducing parallelism.
Adjustable
pool_sizeandmax_overflowensure workers acquire connections promptly under load, improving throughput and preventing bottlenecks without over-provisioning persistent resources.This setup scales efficiently for production ASGI apps like FastAPI, balancing performance with resource limits as observed in SQLAlchemy QueuePool behavior.
Neo4j Index Threshold Properties
range_index_creation_thresholddetermines the minimum number of distinct values required in a property before Neo4j automatically creates a range index during planning, optimizing queries on discrete numeric or temporal data.vector_index_creation_thresholdsets the minimum number of distinct vector embeddings needed for automatic vector index creation, enabling efficient similarity searches in graph embeddings workloads.Why Configure for Load Testing
Under high-concurrency load tests with FastAPI workers generating dynamic Cypher queries, unset thresholds can delay index creation or force full scans, stalling processes and limiting query parallelism.
Tuning these thresholds ensures automatic indexes form proactively for common access patterns, reducing planning latency and allowing workers to retrieve results faster without manual intervention.
This configuration boosts throughput in production Neo4j deployments by balancing automatic optimization with resource constraints, similar to connection pool tuning.
Fixes/Closes
Fixes #906
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.
Test Results: [Attach logs, screenshots, or relevant output]
See #837 and #906 for details.
Checklist
Maintainer Checklist
Screenshots/Gifs
As above
Further comments
As above