Skip to content

Conversation

@YassinNouh21
Copy link
Contributor

What this PR does / why we need it

Adds CLI commands to import dbt models as Feast FeatureViews, enabling automatic generation of Feast objects from dbt manifest.json files.

  • feast dbt list - Discover dbt models available for import
  • feast dbt import - Create Feast FeatureViews from dbt models
  • --output option to generate Python files instead of applying to registry
  • Supports BigQuery, Snowflake, and File data sources
  • Maps 38 dbt data types to Feast types (including ARRAY and NUMBER with precision)
  • Preserves dbt metadata (tags, descriptions) in generated Feast objects

Which issue(s) this PR fixes

Closes #3335

Does this PR introduce a user-facing change

Yes. Users can now import dbt models as Feast FeatureViews using:

# List models with 'feast' tag
feast dbt list -m target/manifest.json --tag feast

# Import models to registry
feast dbt import -m target/manifest.json -e driver_id --tag feast

# Generate Python file instead
feast dbt import -m target/manifest.json -e driver_id --output features.py

Test plan

  • Unit tests for parser, mapper, and codegen modules
  • Tested with real dbt projects at 3 complexity levels
  • Verified generated Python files are syntactically correct and importable

Signed-off-by: yassinnouh21 [email protected]

@YassinNouh21 YassinNouh21 requested a review from a team as a code owner January 10, 2026 12:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds dbt integration to Feast, enabling users to automatically import dbt models as Feast FeatureViews. The integration includes CLI commands for discovering and importing dbt models, with support for BigQuery, Snowflake, and File data sources.

Changes:

  • Adds feast dbt list and feast dbt import CLI commands for dbt model discovery and import
  • Implements comprehensive dbt type mapping (38 data types including ARRAY and NUMBER with precision)
  • Provides code generation capability to output Python files instead of applying directly to registry

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
setup.py Adds dbt-artifacts-parser dependency for dbt integration
sdk/python/feast/dbt/parser.py Implements dbt manifest.json parsing with typed support for dbt versions 0.19-1.11+
sdk/python/feast/dbt/mapper.py Maps dbt data types to Feast types and creates Feast objects from dbt models
sdk/python/feast/dbt/codegen.py Generates Python code for Feast objects using Jinja2 templates
sdk/python/feast/cli/dbt_import.py Implements CLI commands for listing and importing dbt models
sdk/python/feast/cli/cli.py Integrates dbt command group into main CLI
sdk/python/tests/unit/dbt/test_parser.py Unit tests for dbt manifest parser
sdk/python/tests/unit/dbt/test_mapper.py Unit tests for dbt-to-Feast type mapping and object creation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@YassinNouh21 YassinNouh21 force-pushed the feat/dbt-feast-integration-3335-clean branch 3 times, most recently from e581fec to 3fb62f3 Compare January 10, 2026 14:27
…-dev#3335)

This PR implements the dbt-Feast integration feature requested in feast-dev#3335,
enabling users to import dbt models as Feast FeatureViews.

## New CLI Commands

- `feast dbt list` - List dbt models available for import
- `feast dbt import` - Import dbt models as Feast objects

## Features

- Parse dbt manifest.json files to extract model metadata
- Map dbt types to Feast types (38 types supported)
- Generate Entity, DataSource, and FeatureView objects
- Support for BigQuery, Snowflake, and File data sources
- Tag-based filtering (--tag) to select specific models
- Code generation (--output) to create Python files
- Dry-run mode to preview changes before applying

## Usage Examples

```bash
# List models with 'feast' tag
feast dbt list -m target/manifest.json --tag feast

# Import models to registry
feast dbt import -m target/manifest.json -e driver_id --tag feast

# Generate Python file instead
feast dbt import -m target/manifest.json -e driver_id --output features.py
```

Closes feast-dev#3335

Signed-off-by: yassinnouh21 <[email protected]>
- Add dbt-artifacts-parser as optional dependency (feast[dbt])
- Update parser to use typed parsing with fallback to raw dict
- Provides better support for manifest versions v1-v12

Signed-off-by: yassinnouh21 <[email protected]>
When parsing minimal/incomplete manifests (e.g., in unit tests),
dbt-artifacts-parser may fail validation. This change adds a graceful
fallback to use raw dict parsing when typed parsing fails.

Also updated test fixture with dbt_schema_version field.

Signed-off-by: yassinnouh21 <[email protected]>
Since dbt-artifacts-parser is an optional dependency, unit tests
should be skipped in CI when it's not installed.

Signed-off-by: yassinnouh21 <[email protected]>
Removed manual/fallback dict parsing code. The parser now exclusively
uses dbt-artifacts-parser typed objects. Updated test fixtures to
create complete manifests that dbt-artifacts-parser can parse.

Signed-off-by: yassinnouh21 <[email protected]>
Install dbt-artifacts-parser in CI so dbt unit tests run instead
of being skipped.

Signed-off-by: yassinnouh21 <[email protected]>
- mapper.py: Fix Array element type check to use set membership instead
  of incorrect isinstance() comparison
- codegen.py: Add safe getattr() with fallback for Array.base_type access

Signed-off-by: yassinnouh21 <[email protected]>
@YassinNouh21 YassinNouh21 force-pushed the feat/dbt-feast-integration-3335-clean branch from 3fb62f3 to 01730a8 Compare January 10, 2026 14:28
@YassinNouh21
Copy link
Contributor Author

@franciscojavierarceo it passed the CI and I reviewed your comments

@franciscojavierarceo
Copy link
Member

Can you add documentation for this? 🙏

@YassinNouh21 YassinNouh21 force-pushed the feat/dbt-feast-integration-3335-clean branch from 2261481 to 9929527 Compare January 11, 2026 11:38
- Add dbt-artifacts-parser to pyproject.toml under feast[dbt] and feast[ci] extras
- Remove separate install step from unit_tests.yml workflow
- Update all requirements lock files

Addresses review feedback from @ntkathole.

Signed-off-by: YassinNouh21 <[email protected]>
Signed-off-by: yassinnouh21 <[email protected]>
@YassinNouh21 YassinNouh21 force-pushed the feat/dbt-feast-integration-3335-clean branch from 9929527 to fb40e93 Compare January 11, 2026 11:53
Add comprehensive documentation for the new dbt integration feature:
- Quick start guide with step-by-step instructions
- CLI reference for `feast dbt list` and `feast dbt import`
- Type mapping table for dbt to Feast types
- Data source configuration examples (BigQuery, Snowflake, File)
- Best practices for tagging, documentation, and CI/CD
- Troubleshooting section

Addresses review feedback from @franciscojavierarceo.

Signed-off-by: YassinNouh21 <[email protected]>
Signed-off-by: yassinnouh21 <[email protected]>
@YassinNouh21
Copy link
Contributor Author

@franciscojavierarceo @ntkathole

Thanks for the reviews! I've addressed all the feedback:

  • Added comprehensive documentation for the dbt integration (commit 53932ff)
  • Added dbt-artifacts-parser to both setup.py and pyproject.toml under feast[ci] (commit fb40e93)

All CI checks are passing and there are no merge conflicts. The PR should be ready for another review when you have a chance.

Let me know if there's anything else needed! 🙏


```bash
feast dbt import target/manifest.json \
--entity-column driver_id \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when a user has multiple dbt models with multiple entities? ideally we can go from some sort of metadata tag to autogenerating the Entity.

help="Preview what would be created without applying changes",
)
@click.option(
"--exclude-columns",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably we should allow users to use either exclude or include because sometimes users can have really big models

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would make sense to raise if users tried to use both

"-o",
type=click.Path(),
default=None,
help="Output Python file path (e.g., features.py). Generates code instead of applying to registry.",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the general case of multiple tables -> multiple feature views, we should probably generate one python file per dbt model

click.echo(f"{Fore.CYAN}Parsing dbt manifest: {manifest_path}{Style.RESET_ALL}")

try:
parser = DbtManifestParser(manifest_path)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense to do the manifest but allow users to interactively test a single model could have utility as well so that they can quickly iterate and run feast apply until they get it to work (good devx)

cli_check_repo(repo, fs_yaml_file)
store = create_feature_store(ctx)

store.apply(all_objects)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel the user should generate the code first and then separately run feast apply manually

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YassinNouh21 this is one change I do want to include here. Can we add a parameter instead that says "apply" to the CLI then? by default we shouldn't apply in the dbt code gen.

It should be:

feast dbt-import --apply=True

Or something like that.

Float64: "Float64",
Bool: "Bool",
UnixTimestamp: "UnixTimestamp",
Bytes: "Bytes",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess Image and PdfBytes wouldn't be sensible here

@HaoXuAI
Copy link
Collaborator

HaoXuAI commented Jan 14, 2026

looks good to me

YassinNouh21 added a commit to YassinNouh21/feast that referenced this pull request Jan 14, 2026
Add prominent warning callout highlighting that the dbt integration is
an alpha feature with current limitations. This sets proper expectations
for users regarding:
- Supported data sources (BigQuery, Snowflake, File only)
- Single entity per model constraint
- Potential for breaking changes in future releases

Addresses feedback from PR feast-dev#5827 review comments.
YassinNouh21 added a commit to YassinNouh21/feast that referenced this pull request Jan 14, 2026
Ensure dbt-artifacts-parser is installed in CI environments by adding
it to the CI_REQUIRED list in setup.py. This matches the dependency
already present in pyproject.toml and ensures CI tests for dbt
integration have access to the required parser library.

Addresses feedback from PR feast-dev#5827 review comments.
Add prominent warning callout highlighting that the dbt integration is
an alpha feature with current limitations. This sets proper expectations
for users regarding:
- Supported data sources (BigQuery, Snowflake, File only)
- Single entity per model constraint
- Potential for breaking changes in future releases

Addresses feedback from PR feast-dev#5827 review comments.

Signed-off-by: yassinnouh21 <[email protected]>
Ensure dbt-artifacts-parser is installed in CI environments by adding
it to the CI_REQUIRED list in setup.py. This matches the dependency
already present in pyproject.toml and ensures CI tests for dbt
integration have access to the required parser library.

Addresses feedback from PR feast-dev#5827 review comments.

Signed-off-by: yassinnouh21 <[email protected]>
@YassinNouh21 YassinNouh21 force-pushed the feat/dbt-feast-integration-3335-clean branch from 2807a7a to b2901f4 Compare January 14, 2026 08:57
Add logging and defensive attribute access for Array.base_type in code
generation to prevent potential AttributeError. While Array.__init__
always sets base_type, defensive programming with warnings provides:
- Protection against edge cases or future Array implementation changes
- Clear visibility when fallback occurs via logger.warning
- Consistent error handling across both usage sites

Changes:
- Add logging module and logger instance
- Update _get_feast_type_name() to use getattr with warning
- Update import tracking logic to use getattr with warning
- Add concise comments with examples (e.g., Array(String) -> base_type = String)

Addresses code review feedback from PR feast-dev#5827.

Signed-off-by: yassinnouh21 <[email protected]>
@YassinNouh21
Copy link
Contributor Author

You're right - ImageBytes and PdfBytes wouldn't make sense here. While these types exist in Feast, dbt manifests only expose generic BYTES type without semantic information about whether bytes represent images, PDFs, or other binary data.

Example:

-- dbt model
SELECT 
    user_id,
    profile_photo,  -- BigQuery type: BYTES
    resume_pdf      -- BigQuery type: BYTES  
FROM users

dbt manifest only shows "data_type": "BYTES" for both columns - no way to distinguish images from PDFs. The mapping only includes types that can actually appear in dbt artifacts.


Re: codegen.py:152 ImageBytes/PdfBytes comment

Add clarifying comment in type_map explaining why ImageBytes and
PdfBytes are not included in the dbt type mapping. While these types
exist in Feast, dbt manifests only expose generic BYTES type without
semantic information to distinguish between regular bytes, images, or
PDFs.

Example: A dbt model with image and PDF columns both appear as
'BYTES' in the manifest, making ImageBytes/PdfBytes types unmappable
from dbt artifacts.

Addresses feedback from PR feast-dev#5827 review (franciscojavierarceo).

Signed-off-by: yassinnouh21 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feast <> dbt integration

4 participants