[IcebergIO] Support column pruning#34856
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #34856 +/- ##
============================================
+ Coverage 54.53% 56.48% +1.95%
- Complexity 1479 3301 +1822
============================================
Files 1010 1182 +172
Lines 160455 181555 +21100
Branches 1079 3409 +2330
============================================
+ Hits 87500 102553 +15053
- Misses 70857 75738 +4881
- Partials 2098 3264 +1166
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
assign set of reviewers |
|
Assigning reviewers. If you would like to opt out of this review, comment R: @m-trieu for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
| </td> | ||
| <td> | ||
| <code>list[<span style="color: green;">str</span>]</code> | ||
| </td> |
There was a problem hiding this comment.
do we need to specify the required Beam SDK version?
There was a problem hiding this comment.
Hmmm not sure how to do this considering the file is auto-generated. Maybe we can include it in the schema field's description but that doesn't seem very clean
There was a problem hiding this comment.
Won't it work, even on SDKs that don't understand it yet?
There was a problem hiding this comment.
only on Dataflow Runner V2 -- it'll fail if a user tries experimenting with any other runner + old SDK
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergScanConfig.java
Outdated
Show resolved
Hide resolved
|
Please update CHANGES.md |
|
Thanks Kenn |
|
Ahh forgot to update CHANGES. I'll open another PR to do that |
Part of #34789
Allows users to pass a list of field names to either keep or drop when reading from an Iceberg table.
For example, say we have a table with columns
colA,colB,colC,colD,colE. Either of the following will produce the same output:keep: ["colA", "colE"]drop: ["colB", "colC", "colD"]keepanddropare mutually exclusive and an error will be thrown if both are specified