Add rag enrichment and ingestion.#33413
Merged
damccorm merged 18 commits intoapache:masterfrom Jan 14, 2025
Merged
Conversation
e50d908 to
e966cb7
Compare
ba5ef2a to
b759659
Compare
b759659 to
b2ad7ac
Compare
Collaborator
Author
|
R: @damccorm |
b2ad7ac to
589023f
Compare
Contributor
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
damccorm
reviewed
Dec 20, 2024
Contributor
damccorm
left a comment
There was a problem hiding this comment.
This is awesome! Comments are all minor, I'm really excited with how this turned out!
sdks/python/apache_beam/ml/rag/enrichment/bigquery_vector_search.py
Outdated
Show resolved
Hide resolved
sdks/python/apache_beam/ml/rag/enrichment/bigquery_vector_search.py
Outdated
Show resolved
Hide resolved
damccorm
reviewed
Dec 26, 2024
sdks/python/apache_beam/ml/rag/enrichment/bigquery_vector_search.py
Outdated
Show resolved
Hide resolved
damccorm
reviewed
Dec 26, 2024
sdks/python/apache_beam/ml/rag/enrichment/bigquery_vector_search_it_test.py
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #33413 +/- ##
============================================
+ Coverage 59.02% 59.05% +0.02%
Complexity 3185 3185
============================================
Files 1146 1149 +3
Lines 176085 176219 +134
Branches 3368 3368
============================================
+ Hits 103942 104073 +131
- Misses 68787 68790 +3
Partials 3356 3356
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
damccorm
approved these changes
Jan 14, 2025
Contributor
damccorm
left a comment
There was a problem hiding this comment.
Just one more nit, otherwise LGTM
VardhanThigle
pushed a commit
to VardhanThigle/beam
that referenced
this pull request
Mar 21, 2025
* Add base VectorDatabaseTransform. * Add BigQueryVectorWriterConfig. * Add BigQueryVectorWriterConfig tests. * Allow overriding joinfn and custom types from EnrichmentSourceHandler. * Add BigQueryVectorSearchEnrichmentHandler. * Fix streaming test. * Add licence. * Fix project in vector search test and pydocs. * Fix bigquery streaming test. * Resolve open comments. Also fix batching logic when metadata restrictions are applied. * Resolve comments. * Fix bigquery ingestion default schema to work with Avro file loads. * Assert that embedding set in vector write * Call out RAG changes in CHANGES.md. * Add PR links to CHANGES.md. --------- Co-authored-by: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add ingestion and enrichment components for rag pipelines.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.