Skip to content

feat: support json_serializer parameter in create_engine()#823

Open
waiho-gumloop wants to merge 1 commit intogoogleapis:mainfrom
waiho-gumloop:feat/support-json-serializer-kwarg
Open

feat: support json_serializer parameter in create_engine()#823
waiho-gumloop wants to merge 1 commit intogoogleapis:mainfrom
waiho-gumloop:feat/support-json-serializer-kwarg

Conversation

@waiho-gumloop
Copy link

Summary

Fixes #822

SpannerDialect now accepts json_serializer and json_deserializer kwargs from create_engine(), matching the standard SQLAlchemy convention used by PostgreSQL, MySQL, SQLite, and other dialects. Previously, passing these raised TypeError.

Approach: serialize-then-wrap

The Spanner pipeline wraps values in JsonObject first, then serializes later in _helpers._make_param_value_pb via obj.serialize(). This differs from other dialects where _json_serializer replaces json.dumps directly.

To bridge this gap without modifying JsonObject or the core google-cloud-spanner library:

  1. The user's json_serializer(value) produces a JSON string with custom types already handled
  2. JsonObject.from_str() parses it back into a JsonObject containing only native Python types
  3. When _helpers.py later calls obj.serialize(), standard json.dumps works — no custom types remain

This is a one-extra-json.loads round-trip per JSON value, which is negligible for typical payloads.

Backward compatible: When no json_serializer is provided, the dialect behaves identically to before (_json_serializer = JsonObject).

Example

from sqlalchemy import create_engine
import json, datetime

def my_serializer(obj):
    return json.dumps(obj, default=lambda o: o.isoformat() if hasattr(o, 'isoformat') else str(o))

engine = create_engine(
    "spanner:///projects/p/instances/i/databases/d",
    json_serializer=my_serializer,
)

Changes

  • sqlalchemy_spanner.py: Added __init__ to SpannerDialect accepting json_serializer / json_deserializer, plus _make_json_serializer() factory
  • test/unit/test_json_serializer.py: 21 unit + integration tests covering the factory, dialect wiring, SQLAlchemy bind_processor pipeline, null handling, and Spanner _helpers.py compatibility

Test plan

  • Unit tests for _make_json_serializer factory (JsonObject passthrough, callable wrapping, arrays, nulls, nested datetimes)
  • Dialect integration tests (kwarg acceptance, get_cls_kwargs introspection, class attribute isolation)
  • End-to-end tests simulating the full JSON.bind_processorJsonObjectserialize() pipeline
  • Spanner _helpers.py compatibility test (isinstance check + bare serialize() call)

Made with Cursor

SpannerDialect now accepts `json_serializer` and `json_deserializer`
kwargs, matching the standard SQLAlchemy convention used by PostgreSQL
and other dialects. Previously, passing `json_serializer` to
`create_engine()` raised `TypeError` because the dialect's `__init__`
did not declare these parameters.

The implementation uses a serialize-then-wrap strategy: the user's
`json_serializer` function pre-serializes values (handling custom
types like `datetime`), then the result is parsed back into a
`JsonObject` via `from_str()`. This preserves the existing Spanner
client pipeline (`_helpers.py` expects `JsonObject` instances) while
allowing custom type handling — without subclassing or modifying
`JsonObject` itself.

Example usage:

    engine = create_engine(
        "spanner:///...",
        json_serializer=lambda obj: json.dumps(obj, cls=MyEncoder),
    )

Made-with: Cursor
@waiho-gumloop waiho-gumloop requested a review from a team as a code owner February 28, 2026 00:41
@product-auto-label product-auto-label bot added the api: spanner Issues related to the googleapis/python-spanner-sqlalchemy API. label Feb 28, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces support for json_serializer and json_deserializer parameters in the create_engine() function for the SpannerDialect, bringing it in line with standard SQLAlchemy behavior. It addresses the challenge of integrating custom JSON serialization logic with Spanner's existing JsonObject wrapping mechanism through a "serialize-then-wrap" approach, ensuring custom types are handled correctly while maintaining compatibility.

Highlights

  • JSON Serializer Support: Enabled json_serializer and json_deserializer parameters in SpannerDialect's create_engine() method, allowing users to provide custom JSON serialization and deserialization functions.
  • "Serialize-then-wrap" Strategy: Implemented a unique approach where user-provided serializers first convert objects to JSON strings, which are then re-parsed into JsonObject instances, ensuring compatibility with Spanner's internal data handling without modifying core libraries.
  • Backward Compatibility: The dialect retains its previous behavior when no custom json_serializer is provided, ensuring existing applications are unaffected.
  • Comprehensive Testing: Added a new dedicated test file with 21 unit and integration tests covering the new factory, dialect wiring, SQLAlchemy bind processor pipeline, null handling, and compatibility with Spanner's _helpers.py.
Changelog
  • google/cloud/sqlalchemy_spanner/sqlalchemy_spanner.py
    • Introduced _make_json_serializer function to adapt user-provided JSON serializers to Spanner's JsonObject pipeline.
    • Modified SpannerDialect.__init__ to accept and apply json_serializer and json_deserializer arguments.
  • test/unit/test_json_serializer.py
    • Added new file test/unit/test_json_serializer.py containing extensive tests for the _make_json_serializer factory, SpannerDialect's JSON serialization capabilities, and end-to-end scenarios.
Activity
  • No human activity (comments, reviews, etc.) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for custom json_serializer and json_deserializer in create_engine, aligning the Spanner dialect with other SQLAlchemy dialects. The implementation uses a 'serialize-then-wrap' strategy which is well-documented. The accompanying tests are comprehensive.

However, I've identified a significant issue with how JSON null values are handled, both with the new custom serializer and in the existing default behavior. The current implementation incorrectly maps Python None to SQL NULL instead of JSON null when JSON(none_as_null=False) is used, which is the default. This prevents storing JSON null values. I've left detailed comments on the implementation and the corresponding test case with suggestions on how to address this to ensure compliance with SQLAlchemy's documented behavior.

Comment on lines +850 to +852
def _factory(value):
json_str = json_serializer(value)
return JsonObject.from_str(json_str)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This implementation incorrectly handles JSON null values. When SQLAlchemy processes a JSON type with none_as_null=False (the default), a Python None value should be stored as a JSON null.

However, in this _factory:

  1. json_serializer(None) returns the string 'null'.
  2. JsonObject.from_str('null') creates a JsonObject(None) because json.loads('null') is None.
  3. JsonObject(None).serialize() returns None, which the DBAPI driver interprets as a SQL NULL.

This prevents storing JSON null values and contradicts the expected behavior of none_as_null=False.

To fix this, you should handle None values (and 'null' strings from the serializer) specially to produce an object that serializes to the string 'null'. This likely requires a private JsonObject subclass that overrides the serialize() method for this purpose.

Comment on lines +85 to +89
def test_callable_handles_null(self):
factory = _make_json_serializer(lambda v: json.dumps(v))
obj = factory(None)
assert isinstance(obj, JsonObject)
assert obj.serialize() is None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This test seems to confirm an incorrect behavior. factory(None) should produce a JsonObject that represents a JSON null, which should serialize to the string 'null', not Python None (which represents a SQL NULL).

As implemented, factory(None) calls json.dumps(None) which is 'null', then JsonObject.from_str('null') which is JsonObject(None), and JsonObject(None).serialize() returns None. This means a Python None is always converted to a SQL NULL, which prevents storing JSON null values when using JSON(none_as_null=False).

This test should be updated to assert the correct behavior, which is serialization to 'null'. For example: assert obj.serialize() == 'null'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: spanner Issues related to the googleapis/python-spanner-sqlalchemy API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SpannerDialect does not accept json_serializer / json_deserializer kwargs from create_engine()

1 participant