Skip to content

Python SDK get_historical_features does not use field mappings. #2248

@michelle-rascati-sp

Description

@michelle-rascati-sp

Expected Behavior

When setting a field mapping for offline data sources such as {"column_name": "feature_name"}, I would expect to call get_historical_features(features=["feature_name"]) and get back a dataframe with this feature_name as a column.

Current Behavior

  • File data source: works as expected.
  • Bigquery data source: google.api_core.exceptions.BadRequest: 400 Unrecognized name: feature_name at [581:13]
  • Redshfit data source: Redshift SQL Query failed to finish. Details: ... 'Error': 'ERROR: column "feature_name" does not exist

Steps to reproduce

Within the fraud detection tutorial, update the fraud_features.py to use a field mapping in the user_transaction_count_7d feature view:

driver_stats_fv = FeatureView(
    name="user_transaction_count_7d",
    entities=["user_id"],
    ttl=timedelta(weeks=1),
    batch_source=BigQuerySource(
        table_ref=f"{PROJECT_ID}.{BIGQUERY_DATASET_NAME}.user_count_transactions_7d",
        event_timestamp_column="feature_timestamp",
        field_mapping={{"transaction_count_7d": "transaction_count_7d_fm"}}))

when calling get_historical_features you get an error that this column doesn't exist.

training_data = store.get_historical_features(
    entity_df=f"""
    select 
        src_account as user_id,
        timestamp,
        is_fraud
    from
        feast-oss.fraud_tutorial.transactions
    where
        timestamp between timestamp('{two_days_ago.isoformat()}') 
        and timestamp('{now.isoformat()}')""",
    features=[
        "user_transaction_count_7d:transaction_count_7d_fm",
        "user_account_features:credit_score",
        "user_account_features:account_age_days",
        "user_account_features:user_has_2fa_installed",
        "user_has_fraudulent_transactions:user_has_fraudulent_transactions_7d"
    ],
    full_feature_names=True
).to_df()

training_data.head()

> BadRequest: 400 Unrecognized name: transaction_count_7d_fm; Did you mean transaction_count_7d? at [77:13]

Note, the materialize step handles the field mapping appropriately, and get_online_features works as expected.

feature_vector = store.get_online_features(
        features=[
        "user_transaction_count_7d:transaction_count_7d_fm",
        "user_account_features:credit_score",
        "user_account_features:account_age_days",
        "user_account_features:user_has_2fa_installed",
        "user_has_fraudulent_transactions:user_has_fraudulent_transactions_7d"
    ],
        entity_rows=entity_rows
    ).to_dict()

> {'credit_score': [480], 'account_age_days': [655], 'user_has_2fa_installed': [1], 'transaction_count_7d_fm': [6], 'user_has_fraudulent_transactions_7d': [0.0]}

Specifications

  • Version: 0.17
  • Platform: Any
  • Subsystem:

Possible Solution

Update data sources to query from the column name and return the mapped feature name.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions