Skip to content

Conversation

@aniketpalu
Copy link
Contributor

What this PR does / why we need it:

  • Add support for entity_df=None in RayOfflineStore.get_historical_features with start_date/end_date.
    -- Derives entity set by reading distinct join keys from each FeatureView source within the time window, applies field mappings and join_key_map, filters by timestamp, and unions aligned schemas.
    -- Adds stable event_timestamp = end_date for PIT joins.

  • Signature change: get_historical_features accepts entity_df: Optional[Union[pd.DataFrame, str]] and **kwargs.
    -- Why: Match base interface and support date-only retrieval.

Which issue(s) this PR fixes:

RHOAIENG-38643

Misc

@aniketpalu aniketpalu requested a review from a team as a code owner November 25, 2025 19:18
Copy link
Collaborator

@jyejare jyejare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good initially, have some doubts.

Also needs to add tests.

return pa.Table.from_pandas(df).schema


def _compute_non_entity_dates_ray(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have make a common utility function for this, so that it can be used in all stores without repeating the code.

wdyt ?

return _filter_range


def _make_select_distinct_keys(join_keys: List[str]):
Copy link
Collaborator

@jyejare jyejare Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not drop rows with duplicate IDs, because there could be multiple transactions per ID and we need to choose the row based on timestamp while joining the colums from another table/view. I think this is the same case with your spark PR.

Please check the postgres implementation to understand the case.

Or Am I misreading this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing the case after discussion

Copy link
Contributor Author

@aniketpalu aniketpalu Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, when entity_df=None was passed to get_historical_features(), the Ray offline store would extract only distinct entity keys and assign a single fixed timestamp (end_date) to all entities. This broke point-in-time joins for cases where multiple transactions exist per entity ID in date-time range.

Now extracts distinct (entity_keys, event_timestamp) combinations, aligning with Postgres based offline store's behaviour.

@ntkathole ntkathole merged commit e484c12 into feast-dev:master Dec 30, 2025
15 of 17 checks passed
franciscojavierarceo pushed a commit that referenced this pull request Jan 16, 2026
# [0.59.0](v0.58.0...v0.59.0) (2026-01-16)

### Bug Fixes

* Add get_table_query_string_with_alias() for PostgreSQL subquery aliasing ([#5811](#5811)) ([11122ce](11122ce))
* Add hybrid online store to ONLINE_STORE_CLASS_FOR_TYPE mapping ([#5810](#5810)) ([678589b](678589b))
* Add possibility to overwrite send_receive_timeout for clickhouse offline store ([#5792](#5792)) ([59dbb33](59dbb33))
* Denial by default to all resources when no permissions set  ([#5663](#5663)) ([1524f1c](1524f1c))
* Make operator include full OIDC secret in repo config ([#5676](#5676)) ([#5809](#5809)) ([a536bc2](a536bc2))
* Populate Postgres `registry.path` during `feast init` ([#5785](#5785)) ([f293ae8](f293ae8))
* **redis:** Preserve millisecond timestamp precision for Redis online store ([#5807](#5807)) ([9e3f213](9e3f213))
* Search API to return all matching tags in matched_tags field ([#5843](#5843)) ([de37f66](de37f66))
* Spark Materialization Engine Cannot Infer Schema ([#5806](#5806)) ([58d0325](58d0325)), closes [#5594](#5594) [#5594](#5594)
* Support arro3 table schema with newer deltalake packages ([#5799](#5799)) ([103c5e9](103c5e9))
* Timestamp formatting and lakehouse-type connector for trino_offline_store. ([#5846](#5846)) ([c2ea7e9](c2ea7e9))
* Update model_validator to use instance method signature (Pydantic v2.12 deprecation) ([#5825](#5825)) ([3c10b6e](3c10b6e))

### Features

* Add dbt integration for importing models as FeatureViews ([#5827](#5827)) ([b997361](b997361)), closes [#3335](#3335) [#3335](#3335) [#3335](#3335)
* Add GCS registry store in Go feature server ([#5818](#5818)) ([1dc2be5](1dc2be5))
* Add progress bar to CLI from feast apply ([#5867](#5867)) ([ab3562b](ab3562b))
* Add RBAC blog post to website ([#5861](#5861)) ([b1844a3](b1844a3))
* Add skip_feature_view_validation parameter to FeatureStore.apply() and plan() ([#5859](#5859)) ([5482a0e](5482a0e))
* Added batching to feature server /push to offline store ([#5683](#5683)) ([#5729](#5729)) ([ce35ce6](ce35ce6))
* Enable static artifacts for feature server that can be used in Feature Transformations ([#5787](#5787)) ([edefc3f](edefc3f))
* Improve lambda materialization engine ([#5829](#5829)) ([f6116f9](f6116f9))
* Offline Store historical features retrieval based on datetime range in Ray ([#5738](#5738)) ([e484c12](e484c12))
* Read, Save docs and chat fixes ([#5865](#5865)) ([2081b55](2081b55))
* Resolve pyarrow >21 installation with ibis-framework ([#5847](#5847)) ([8b9bb50](8b9bb50))
* Support staging for spark materialization ([#5671](#5671)) ([#5797](#5797)) ([5b787af](5b787af))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants