-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Add dbt integration for importing models as FeatureViews #5827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: Add dbt integration for importing models as FeatureViews #5827
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds dbt integration to Feast, enabling users to automatically import dbt models as Feast FeatureViews. The integration includes CLI commands for discovering and importing dbt models, with support for BigQuery, Snowflake, and File data sources.
Changes:
- Adds
feast dbt listandfeast dbt importCLI commands for dbt model discovery and import - Implements comprehensive dbt type mapping (38 data types including ARRAY and NUMBER with precision)
- Provides code generation capability to output Python files instead of applying directly to registry
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| setup.py | Adds dbt-artifacts-parser dependency for dbt integration |
| sdk/python/feast/dbt/parser.py | Implements dbt manifest.json parsing with typed support for dbt versions 0.19-1.11+ |
| sdk/python/feast/dbt/mapper.py | Maps dbt data types to Feast types and creates Feast objects from dbt models |
| sdk/python/feast/dbt/codegen.py | Generates Python code for Feast objects using Jinja2 templates |
| sdk/python/feast/cli/dbt_import.py | Implements CLI commands for listing and importing dbt models |
| sdk/python/feast/cli/cli.py | Integrates dbt command group into main CLI |
| sdk/python/tests/unit/dbt/test_parser.py | Unit tests for dbt manifest parser |
| sdk/python/tests/unit/dbt/test_mapper.py | Unit tests for dbt-to-Feast type mapping and object creation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e581fec to
3fb62f3
Compare
…-dev#3335) This PR implements the dbt-Feast integration feature requested in feast-dev#3335, enabling users to import dbt models as Feast FeatureViews. ## New CLI Commands - `feast dbt list` - List dbt models available for import - `feast dbt import` - Import dbt models as Feast objects ## Features - Parse dbt manifest.json files to extract model metadata - Map dbt types to Feast types (38 types supported) - Generate Entity, DataSource, and FeatureView objects - Support for BigQuery, Snowflake, and File data sources - Tag-based filtering (--tag) to select specific models - Code generation (--output) to create Python files - Dry-run mode to preview changes before applying ## Usage Examples ```bash # List models with 'feast' tag feast dbt list -m target/manifest.json --tag feast # Import models to registry feast dbt import -m target/manifest.json -e driver_id --tag feast # Generate Python file instead feast dbt import -m target/manifest.json -e driver_id --output features.py ``` Closes feast-dev#3335 Signed-off-by: yassinnouh21 <[email protected]>
Signed-off-by: yassinnouh21 <[email protected]>
Signed-off-by: yassinnouh21 <[email protected]>
Signed-off-by: yassinnouh21 <[email protected]>
Signed-off-by: yassinnouh21 <[email protected]>
- Add dbt-artifacts-parser as optional dependency (feast[dbt]) - Update parser to use typed parsing with fallback to raw dict - Provides better support for manifest versions v1-v12 Signed-off-by: yassinnouh21 <[email protected]>
When parsing minimal/incomplete manifests (e.g., in unit tests), dbt-artifacts-parser may fail validation. This change adds a graceful fallback to use raw dict parsing when typed parsing fails. Also updated test fixture with dbt_schema_version field. Signed-off-by: yassinnouh21 <[email protected]>
Since dbt-artifacts-parser is an optional dependency, unit tests should be skipped in CI when it's not installed. Signed-off-by: yassinnouh21 <[email protected]>
Removed manual/fallback dict parsing code. The parser now exclusively uses dbt-artifacts-parser typed objects. Updated test fixtures to create complete manifests that dbt-artifacts-parser can parse. Signed-off-by: yassinnouh21 <[email protected]>
Install dbt-artifacts-parser in CI so dbt unit tests run instead of being skipped. Signed-off-by: yassinnouh21 <[email protected]>
- mapper.py: Fix Array element type check to use set membership instead of incorrect isinstance() comparison - codegen.py: Add safe getattr() with fallback for Array.base_type access Signed-off-by: yassinnouh21 <[email protected]>
Signed-off-by: yassinnouh21 <[email protected]>
Signed-off-by: yassinnouh21 <[email protected]>
3fb62f3 to
01730a8
Compare
Signed-off-by: yassinnouh21 <[email protected]>
|
@franciscojavierarceo it passed the CI and I reviewed your comments |
|
Can you add documentation for this? 🙏 |
2261481 to
9929527
Compare
- Add dbt-artifacts-parser to pyproject.toml under feast[dbt] and feast[ci] extras - Remove separate install step from unit_tests.yml workflow - Update all requirements lock files Addresses review feedback from @ntkathole. Signed-off-by: YassinNouh21 <[email protected]> Signed-off-by: yassinnouh21 <[email protected]>
9929527 to
fb40e93
Compare
Add comprehensive documentation for the new dbt integration feature: - Quick start guide with step-by-step instructions - CLI reference for `feast dbt list` and `feast dbt import` - Type mapping table for dbt to Feast types - Data source configuration examples (BigQuery, Snowflake, File) - Best practices for tagging, documentation, and CI/CD - Troubleshooting section Addresses review feedback from @franciscojavierarceo. Signed-off-by: YassinNouh21 <[email protected]> Signed-off-by: yassinnouh21 <[email protected]>
|
@franciscojavierarceo @ntkathole Thanks for the reviews! I've addressed all the feedback:
All CI checks are passing and there are no merge conflicts. The PR should be ready for another review when you have a chance. Let me know if there's anything else needed! 🙏 |
|
|
||
| ```bash | ||
| feast dbt import target/manifest.json \ | ||
| --entity-column driver_id \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens when a user has multiple dbt models with multiple entities? ideally we can go from some sort of metadata tag to autogenerating the Entity.
| help="Preview what would be created without applying changes", | ||
| ) | ||
| @click.option( | ||
| "--exclude-columns", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably we should allow users to use either exclude or include because sometimes users can have really big models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would make sense to raise if users tried to use both
| "-o", | ||
| type=click.Path(), | ||
| default=None, | ||
| help="Output Python file path (e.g., features.py). Generates code instead of applying to registry.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the general case of multiple tables -> multiple feature views, we should probably generate one python file per dbt model
| click.echo(f"{Fore.CYAN}Parsing dbt manifest: {manifest_path}{Style.RESET_ALL}") | ||
|
|
||
| try: | ||
| parser = DbtManifestParser(manifest_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it makes sense to do the manifest but allow users to interactively test a single model could have utility as well so that they can quickly iterate and run feast apply until they get it to work (good devx)
| cli_check_repo(repo, fs_yaml_file) | ||
| store = create_feature_store(ctx) | ||
|
|
||
| store.apply(all_objects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i feel the user should generate the code first and then separately run feast apply manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YassinNouh21 this is one change I do want to include here. Can we add a parameter instead that says "apply" to the CLI then? by default we shouldn't apply in the dbt code gen.
It should be:
feast dbt-import --apply=TrueOr something like that.
| Float64: "Float64", | ||
| Bool: "Bool", | ||
| UnixTimestamp: "UnixTimestamp", | ||
| Bytes: "Bytes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess Image and PdfBytes wouldn't be sensible here
|
looks good to me |
Add prominent warning callout highlighting that the dbt integration is an alpha feature with current limitations. This sets proper expectations for users regarding: - Supported data sources (BigQuery, Snowflake, File only) - Single entity per model constraint - Potential for breaking changes in future releases Addresses feedback from PR feast-dev#5827 review comments.
Ensure dbt-artifacts-parser is installed in CI environments by adding it to the CI_REQUIRED list in setup.py. This matches the dependency already present in pyproject.toml and ensures CI tests for dbt integration have access to the required parser library. Addresses feedback from PR feast-dev#5827 review comments.
Add prominent warning callout highlighting that the dbt integration is an alpha feature with current limitations. This sets proper expectations for users regarding: - Supported data sources (BigQuery, Snowflake, File only) - Single entity per model constraint - Potential for breaking changes in future releases Addresses feedback from PR feast-dev#5827 review comments. Signed-off-by: yassinnouh21 <[email protected]>
Ensure dbt-artifacts-parser is installed in CI environments by adding it to the CI_REQUIRED list in setup.py. This matches the dependency already present in pyproject.toml and ensures CI tests for dbt integration have access to the required parser library. Addresses feedback from PR feast-dev#5827 review comments. Signed-off-by: yassinnouh21 <[email protected]>
2807a7a to
b2901f4
Compare
Add logging and defensive attribute access for Array.base_type in code generation to prevent potential AttributeError. While Array.__init__ always sets base_type, defensive programming with warnings provides: - Protection against edge cases or future Array implementation changes - Clear visibility when fallback occurs via logger.warning - Consistent error handling across both usage sites Changes: - Add logging module and logger instance - Update _get_feast_type_name() to use getattr with warning - Update import tracking logic to use getattr with warning - Add concise comments with examples (e.g., Array(String) -> base_type = String) Addresses code review feedback from PR feast-dev#5827. Signed-off-by: yassinnouh21 <[email protected]>
|
You're right - ImageBytes and PdfBytes wouldn't make sense here. While these types exist in Feast, dbt manifests only expose generic Example: -- dbt model
SELECT
user_id,
profile_photo, -- BigQuery type: BYTES
resume_pdf -- BigQuery type: BYTES
FROM usersdbt manifest only shows Re: codegen.py:152 ImageBytes/PdfBytes comment |
Add clarifying comment in type_map explaining why ImageBytes and PdfBytes are not included in the dbt type mapping. While these types exist in Feast, dbt manifests only expose generic BYTES type without semantic information to distinguish between regular bytes, images, or PDFs. Example: A dbt model with image and PDF columns both appear as 'BYTES' in the manifest, making ImageBytes/PdfBytes types unmappable from dbt artifacts. Addresses feedback from PR feast-dev#5827 review (franciscojavierarceo). Signed-off-by: yassinnouh21 <[email protected]>
What this PR does / why we need it
Adds CLI commands to import dbt models as Feast FeatureViews, enabling automatic generation of Feast objects from dbt manifest.json files.
feast dbt list- Discover dbt models available for importfeast dbt import- Create Feast FeatureViews from dbt models--outputoption to generate Python files instead of applying to registryWhich issue(s) this PR fixes
Closes #3335
Does this PR introduce a user-facing change
Yes. Users can now import dbt models as Feast FeatureViews using:
Test plan
Signed-off-by: yassinnouh21 [email protected]