-
Notifications
You must be signed in to change notification settings - Fork 3.5k
fix(pptx): handle picture shapes with external image references #2914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(pptx): handle picture shapes with external image references #2914
Conversation
|
✅ DCO Check Passed Thanks @emerose, all your commits are properly signed off. 🎉 |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
🟢 Require two reviewer for test updatesWonderful, this rule succeeded.When test data is updated, we require two reviewers
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Thanks @emerose for catching this issue and suggesting this fix. |
|
@ceberam unfortunately, the source file that caused the problem was an internal / private company document that I can't share directly. they were created by a partner, but I don't think that there was anything particularly unusual about them. let me know if there are other ways I can help |
When processing PowerPoint files containing picture shapes that reference
external images (rather than embedded images), the python-pptx library
raises a ValueError("no embedded image") when accessing the `image`
property.
Previously, this caused the entire document conversion to fail because:
1. The `hasattr(shape, "image")` check at line 690 would trigger the
property getter, which raises ValueError (hasattr only catches
AttributeError, not ValueError)
2. The exception handler in `_handle_pictures()` only caught
UnidentifiedImageError and OSError, not ValueError
This fix:
- Removes the unnecessary hasattr check since we already verify the
shape type is MSO_SHAPE_TYPE.PICTURE
- Adds ValueError to the exception handler in `_handle_pictures()` so
that picture shapes with external references are gracefully skipped
with a warning instead of crashing the pipeline
Co-Authored-By: Claude Opus 4.5 <[email protected]>
7045926 to
2c78604
Compare
|
@emerose I've just added a commit with a modification of the test file powerpoint_with_image.pptx that includes a image referenced to an external file. I checked that our latest release fails indeed, and your fix works as expected. I'll approve this PR. Maybe some thoughts for further improvement:
|
I, Sam Quigley <[email protected]>, hereby add my Signed-off-by to this commit: e69779e Signed-off-by: Sam Quigley <[email protected]>
Signed-off-by: Cesar Berrospi Ramis <[email protected]>
2c78604 to
0c025d5
Compare
|
@ceberam @emerose I understand this avoids a failure in the processing, it is actually not fetching the referenced image, right? In case of loading referenced/remote images we should add the explicit option as for html, see https://github.com/docling-project/docling/blob/main/docling/datamodel/backend_options.py#L31. |
Correct, that was the intent of this PR
That’s what I meant with my earlier comment on this conversation |
When processing PowerPoint files containing picture shapes that reference external images (rather than embedded images), the python-pptx library raises a ValueError("no embedded image") when accessing the
imageproperty.Previously, this caused the entire document conversion to fail because:
The
hasattr(shape, "image")check at line 690 would trigger the property getter, which raises ValueError (hasattr only catches AttributeError, not ValueError)The exception handler in
_handle_pictures()only caught UnidentifiedImageError and OSError, not ValueErrorThis fix:
_handle_pictures()so that picture shapes with external references are gracefully skipped with a warning instead of crashing the pipeline