-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
Description
Expected Behavior
When setting a field mapping for offline data sources such as {"column_name": "feature_name"}, I would expect to call get_historical_features(features=["feature_name"]) and get back a dataframe with this feature_name as a column.
Current Behavior
- File data source: works as expected.
- Bigquery data source:
google.api_core.exceptions.BadRequest: 400 Unrecognized name: feature_name at [581:13] - Redshfit data source:
Redshift SQL Query failed to finish. Details: ... 'Error': 'ERROR: column "feature_name" does not exist
Steps to reproduce
Within the fraud detection tutorial, update the fraud_features.py to use a field mapping in the user_transaction_count_7d feature view:
driver_stats_fv = FeatureView(
name="user_transaction_count_7d",
entities=["user_id"],
ttl=timedelta(weeks=1),
batch_source=BigQuerySource(
table_ref=f"{PROJECT_ID}.{BIGQUERY_DATASET_NAME}.user_count_transactions_7d",
event_timestamp_column="feature_timestamp",
field_mapping={{"transaction_count_7d": "transaction_count_7d_fm"}}))
when calling get_historical_features you get an error that this column doesn't exist.
training_data = store.get_historical_features(
entity_df=f"""
select
src_account as user_id,
timestamp,
is_fraud
from
feast-oss.fraud_tutorial.transactions
where
timestamp between timestamp('{two_days_ago.isoformat()}')
and timestamp('{now.isoformat()}')""",
features=[
"user_transaction_count_7d:transaction_count_7d_fm",
"user_account_features:credit_score",
"user_account_features:account_age_days",
"user_account_features:user_has_2fa_installed",
"user_has_fraudulent_transactions:user_has_fraudulent_transactions_7d"
],
full_feature_names=True
).to_df()
training_data.head()
> BadRequest: 400 Unrecognized name: transaction_count_7d_fm; Did you mean transaction_count_7d? at [77:13]
Note, the materialize step handles the field mapping appropriately, and get_online_features works as expected.
feature_vector = store.get_online_features(
features=[
"user_transaction_count_7d:transaction_count_7d_fm",
"user_account_features:credit_score",
"user_account_features:account_age_days",
"user_account_features:user_has_2fa_installed",
"user_has_fraudulent_transactions:user_has_fraudulent_transactions_7d"
],
entity_rows=entity_rows
).to_dict()
> {'credit_score': [480], 'account_age_days': [655], 'user_has_2fa_installed': [1], 'transaction_count_7d_fm': [6], 'user_has_fraudulent_transactions_7d': [0.0]}
Specifications
- Version: 0.17
- Platform: Any
- Subsystem:
Possible Solution
Update data sources to query from the column name and return the mapped feature name.