feat: Offline Store historical features retrieval based on datetime range in Ray #5738

aniketpalu · 2025-11-25T19:18:39Z

What this PR does / why we need it:

Add support for entity_df=None in RayOfflineStore.get_historical_features with start_date/end_date.
-- Derives entity set by reading distinct join keys from each FeatureView source within the time window, applies field mappings and join_key_map, filters by timestamp, and unions aligned schemas.
-- Adds stable event_timestamp = end_date for PIT joins.
Signature change: get_historical_features accepts entity_df: Optional[Union[pd.DataFrame, str]] and **kwargs.
-- Why: Match base interface and support date-only retrieval.

Which issue(s) this PR fixes:

RHOAIENG-38643

Misc

jyejare

Looking good initially, have some doubts.

Also needs to add tests.

jyejare · 2025-12-02T15:12:38Z

sdk/python/feast/infra/offline_stores/contrib/ray_offline_store/ray.py

            return pa.Table.from_pandas(df).schema


+def _compute_non_entity_dates_ray(


I think we should have make a common utility function for this, so that it can be used in all stores without repeating the code.

wdyt ?

jyejare · 2025-12-02T15:19:44Z

sdk/python/feast/infra/offline_stores/contrib/ray_offline_store/ray.py

+    return _filter_range
+
+
+def _make_select_distinct_keys(join_keys: List[str]):


I think we should not drop rows with duplicate IDs, because there could be multiple transactions per ID and we need to choose the row based on timestamp while joining the colums from another table/view. I think this is the same case with your spark PR.

Please check the postgres implementation to understand the case.

Or Am I misreading this ?

Testing the case after discussion

Previously, when entity_df=None was passed to get_historical_features(), the Ray offline store would extract only distinct entity keys and assign a single fixed timestamp (end_date) to all entities. This broke point-in-time joins for cases where multiple transactions exist per entity ID in date-time range.

Now extracts distinct (entity_keys, event_timestamp) combinations, aligning with Postgres based offline store's behaviour.

…ange in Ray Signed-off-by: Aniket Paluskar <[email protected]>

Signed-off-by: Aniket Paluskar <[email protected]>

… joins Signed-off-by: Aniket Paluskar <[email protected]>

Signed-off-by: Aniket Paluskar <[email protected]>

# [0.59.0](v0.58.0...v0.59.0) (2026-01-16) ### Bug Fixes * Add get_table_query_string_with_alias() for PostgreSQL subquery aliasing ([#5811](#5811)) ([11122ce](11122ce)) * Add hybrid online store to ONLINE_STORE_CLASS_FOR_TYPE mapping ([#5810](#5810)) ([678589b](678589b)) * Add possibility to overwrite send_receive_timeout for clickhouse offline store ([#5792](#5792)) ([59dbb33](59dbb33)) * Denial by default to all resources when no permissions set ([#5663](#5663)) ([1524f1c](1524f1c)) * Make operator include full OIDC secret in repo config ([#5676](#5676)) ([#5809](#5809)) ([a536bc2](a536bc2)) * Populate Postgres `registry.path` during `feast init` ([#5785](#5785)) ([f293ae8](f293ae8)) * **redis:** Preserve millisecond timestamp precision for Redis online store ([#5807](#5807)) ([9e3f213](9e3f213)) * Search API to return all matching tags in matched_tags field ([#5843](#5843)) ([de37f66](de37f66)) * Spark Materialization Engine Cannot Infer Schema ([#5806](#5806)) ([58d0325](58d0325)), closes [#5594](#5594) [#5594](#5594) * Support arro3 table schema with newer deltalake packages ([#5799](#5799)) ([103c5e9](103c5e9)) * Timestamp formatting and lakehouse-type connector for trino_offline_store. ([#5846](#5846)) ([c2ea7e9](c2ea7e9)) * Update model_validator to use instance method signature (Pydantic v2.12 deprecation) ([#5825](#5825)) ([3c10b6e](3c10b6e)) ### Features * Add dbt integration for importing models as FeatureViews ([#5827](#5827)) ([b997361](b997361)), closes [#3335](#3335) [#3335](#3335) [#3335](#3335) * Add GCS registry store in Go feature server ([#5818](#5818)) ([1dc2be5](1dc2be5)) * Add progress bar to CLI from feast apply ([#5867](#5867)) ([ab3562b](ab3562b)) * Add RBAC blog post to website ([#5861](#5861)) ([b1844a3](b1844a3)) * Add skip_feature_view_validation parameter to FeatureStore.apply() and plan() ([#5859](#5859)) ([5482a0e](5482a0e)) * Added batching to feature server /push to offline store ([#5683](#5683)) ([#5729](#5729)) ([ce35ce6](ce35ce6)) * Enable static artifacts for feature server that can be used in Feature Transformations ([#5787](#5787)) ([edefc3f](edefc3f)) * Improve lambda materialization engine ([#5829](#5829)) ([f6116f9](f6116f9)) * Offline Store historical features retrieval based on datetime range in Ray ([#5738](#5738)) ([e484c12](e484c12)) * Read, Save docs and chat fixes ([#5865](#5865)) ([2081b55](2081b55)) * Resolve pyarrow >21 installation with ibis-framework ([#5847](#5847)) ([8b9bb50](8b9bb50)) * Support staging for spark materialization ([#5671](#5671)) ([#5797](#5797)) ([5b787af](5b787af))

aniketpalu requested a review from a team as a code owner November 25, 2025 19:18

jyejare suggested changes Dec 2, 2025

View reviewed changes

jyejare approved these changes Dec 30, 2025

View reviewed changes

ntkathole added the ok-to-test label Dec 30, 2025

ntkathole approved these changes Dec 30, 2025

View reviewed changes

aniketpalu added 5 commits December 30, 2025 20:35

feat: Offline Store historical features retrieval based on datetime r…

1b5aebe

…ange in Ray Signed-off-by: Aniket Paluskar <[email protected]>

Reforamatted code to fix lint issues

de8b2c5

Signed-off-by: Aniket Paluskar <[email protected]>

preserve event_timestamp in non-entity mode for correct point-in-time…

c25280c

… joins Signed-off-by: Aniket Paluskar <[email protected]>

Minor lint changes

0c20ac0

Signed-off-by: Aniket Paluskar <[email protected]>

Added test cases for datetime range based feature retrieval in Ray

f589956

Signed-off-by: Aniket Paluskar <[email protected]>

ntkathole force-pushed the RHOAIENG-38643 branch from 408f40c to f589956 Compare December 30, 2025 15:05

ntkathole merged commit e484c12 into feast-dev:master Dec 30, 2025
15 of 17 checks passed

jyejare mentioned this pull request Jan 13, 2026

Ray - historical retrieval without entity dataframe #5842

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Offline Store historical features retrieval based on datetime range in Ray #5738

feat: Offline Store historical features retrieval based on datetime range in Ray #5738

aniketpalu commented Nov 25, 2025

Uh oh!

jyejare left a comment

Uh oh!

jyejare Dec 2, 2025

Uh oh!

jyejare Dec 2, 2025 •

edited

Loading

Uh oh!

aniketpalu Dec 22, 2025

Uh oh!

aniketpalu Dec 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return pa.Table.from_pandas(df).schema


		def _compute_non_entity_dates_ray(

		return _filter_range


		def _make_select_distinct_keys(join_keys: List[str]):

feat: Offline Store historical features retrieval based on datetime range in Ray #5738

feat: Offline Store historical features retrieval based on datetime range in Ray #5738

Conversation

aniketpalu commented Nov 25, 2025

What this PR does / why we need it:

Which issue(s) this PR fixes:

Misc

Uh oh!

jyejare left a comment

Choose a reason for hiding this comment

Uh oh!

jyejare Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

jyejare Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aniketpalu Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

aniketpalu Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jyejare Dec 2, 2025 •

edited

Loading

aniketpalu Dec 26, 2025 •

edited

Loading