feat: support json_serializer parameter in create_engine()#823
feat: support json_serializer parameter in create_engine()#823waiho-gumloop wants to merge 1 commit intogoogleapis:mainfrom
Conversation
SpannerDialect now accepts `json_serializer` and `json_deserializer`
kwargs, matching the standard SQLAlchemy convention used by PostgreSQL
and other dialects. Previously, passing `json_serializer` to
`create_engine()` raised `TypeError` because the dialect's `__init__`
did not declare these parameters.
The implementation uses a serialize-then-wrap strategy: the user's
`json_serializer` function pre-serializes values (handling custom
types like `datetime`), then the result is parsed back into a
`JsonObject` via `from_str()`. This preserves the existing Spanner
client pipeline (`_helpers.py` expects `JsonObject` instances) while
allowing custom type handling — without subclassing or modifying
`JsonObject` itself.
Example usage:
engine = create_engine(
"spanner:///...",
json_serializer=lambda obj: json.dumps(obj, cls=MyEncoder),
)
Made-with: Cursor
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces support for Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds support for custom json_serializer and json_deserializer in create_engine, aligning the Spanner dialect with other SQLAlchemy dialects. The implementation uses a 'serialize-then-wrap' strategy which is well-documented. The accompanying tests are comprehensive.
However, I've identified a significant issue with how JSON null values are handled, both with the new custom serializer and in the existing default behavior. The current implementation incorrectly maps Python None to SQL NULL instead of JSON null when JSON(none_as_null=False) is used, which is the default. This prevents storing JSON null values. I've left detailed comments on the implementation and the corresponding test case with suggestions on how to address this to ensure compliance with SQLAlchemy's documented behavior.
| def _factory(value): | ||
| json_str = json_serializer(value) | ||
| return JsonObject.from_str(json_str) |
There was a problem hiding this comment.
This implementation incorrectly handles JSON null values. When SQLAlchemy processes a JSON type with none_as_null=False (the default), a Python None value should be stored as a JSON null.
However, in this _factory:
json_serializer(None)returns the string'null'.JsonObject.from_str('null')creates aJsonObject(None)becausejson.loads('null')isNone.JsonObject(None).serialize()returnsNone, which the DBAPI driver interprets as a SQLNULL.
This prevents storing JSON null values and contradicts the expected behavior of none_as_null=False.
To fix this, you should handle None values (and 'null' strings from the serializer) specially to produce an object that serializes to the string 'null'. This likely requires a private JsonObject subclass that overrides the serialize() method for this purpose.
| def test_callable_handles_null(self): | ||
| factory = _make_json_serializer(lambda v: json.dumps(v)) | ||
| obj = factory(None) | ||
| assert isinstance(obj, JsonObject) | ||
| assert obj.serialize() is None |
There was a problem hiding this comment.
This test seems to confirm an incorrect behavior. factory(None) should produce a JsonObject that represents a JSON null, which should serialize to the string 'null', not Python None (which represents a SQL NULL).
As implemented, factory(None) calls json.dumps(None) which is 'null', then JsonObject.from_str('null') which is JsonObject(None), and JsonObject(None).serialize() returns None. This means a Python None is always converted to a SQL NULL, which prevents storing JSON null values when using JSON(none_as_null=False).
This test should be updated to assert the correct behavior, which is serialization to 'null'. For example: assert obj.serialize() == 'null'.
Summary
Fixes #822
SpannerDialectnow acceptsjson_serializerandjson_deserializerkwargs fromcreate_engine(), matching the standard SQLAlchemy convention used by PostgreSQL, MySQL, SQLite, and other dialects. Previously, passing these raisedTypeError.Approach: serialize-then-wrap
The Spanner pipeline wraps values in
JsonObjectfirst, then serializes later in_helpers._make_param_value_pbviaobj.serialize(). This differs from other dialects where_json_serializerreplacesjson.dumpsdirectly.To bridge this gap without modifying
JsonObjector the coregoogle-cloud-spannerlibrary:json_serializer(value)produces a JSON string with custom types already handledJsonObject.from_str()parses it back into aJsonObjectcontaining only native Python types_helpers.pylater callsobj.serialize(), standardjson.dumpsworks — no custom types remainThis is a one-extra-
json.loadsround-trip per JSON value, which is negligible for typical payloads.Backward compatible: When no
json_serializeris provided, the dialect behaves identically to before (_json_serializer = JsonObject).Example
Changes
sqlalchemy_spanner.py: Added__init__toSpannerDialectacceptingjson_serializer/json_deserializer, plus_make_json_serializer()factorytest/unit/test_json_serializer.py: 21 unit + integration tests covering the factory, dialect wiring, SQLAlchemy bind_processor pipeline, null handling, and Spanner_helpers.pycompatibilityTest plan
_make_json_serializerfactory (JsonObject passthrough, callable wrapping, arrays, nulls, nested datetimes)get_cls_kwargsintrospection, class attribute isolation)JSON.bind_processor→JsonObject→serialize()pipeline_helpers.pycompatibility test (isinstancecheck + bareserialize()call)Made with Cursor