Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Jan 30, 2026

Only interface, without actual GC

split from #6849

Summary by CodeRabbit

  • New Features
    • Full garbage collection API: enable/disable, manual collection, and generational behavior
    • GC monitoring: per-generation stats, collection counts, debug flags, and inspectable tracked objects
    • GC controls: configurable thresholds, freeze/unfreeze permanent generation, and GC callbacks/garbage lists
  • Behavioral
    • Objects can report GC tracking and expose their referents for inspection

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

📝 Walkthrough

Walkthrough

Implements a generational GC subsystem and exposes a Python-compatible gc module API, adds VM context fields for gc state, extends PyObject inspection for GC, and updates Cargo features and spellcheck tokens.

Changes

Cohort / File(s) Summary
Spellcheck Dictionary
.cspell.dict/cpython.txt
Added tokens saveall, uncollectable.
GC Module (stdlib)
crates/stdlib/src/gc.rs
New public gc module API: debug flags, enable/disable, collect, thresholds/counts/stats, object introspection, freeze/unfreeze, callbacks invocation, and argument structs.
VM Feature & Exposure
crates/vm/Cargo.toml, crates/vm/src/lib.rs
Added gc feature and made it default; exported pub mod gc_state.
GC Core State
crates/vm/src/gc_state.rs
Introduced GcState, GcGeneration, GcStats, debug flags, tracking registries, freeze/unfreeze, threshold/count/stats accessors, collection stubs, and global singleton accessor.
Object Model Changes
crates/vm/src/object/core.rs
Made gc_finalized public; added is_gc_tracked() and gc_get_referents() on PyObject for tracking and referent enumeration.
VM Context
crates/vm/src/vm/context.rs
Added gc_callbacks: PyListRef and gc_garbage: PyListRef to Context and initialized them at genesis.

Sequence Diagram

sequenceDiagram
    actor Python
    participant GC as "gc module"
    participant GCState as "GcState (global)"
    participant Callbacks as "callbacks list"
    participant VM as "VM Context"

    Python->>GC: collect(generation=None)
    GC->>GCState: collect(generation)
    activate GCState
    GCState->>Callbacks: invoke_callbacks("start", generation, 0, 0)
    Callbacks->>VM: execute registered callbacks
    VM-->>Callbacks: callback results
    GCState->>GCState: perform collection (counts)
    GCState->>Callbacks: invoke_callbacks("stop", generation, collected, uncollectable)
    Callbacks->>VM: execute registered callbacks
    VM-->>Callbacks: callback results
    deactivate GCState
    GCState-->>GC: return (collected, uncollectable)
    GC-->>Python: return counts
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • impl gc finialized #6689: Related changes to PyObject finalization and gc_bits handling which intersects with making gc_finalized public and GC tracking logic.

Suggested reviewers

  • ShaharNaveh

Poem

🐰 I nibble code where cycles hide,
I count three generations wide,
I hop through lists of callbacks bright,
I freeze, unfreeze in moonlit night,
Hooray — the GC burrows right!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'gc module internal structure and API' accurately describes the main purpose of this pull request, which adds the internal GC state structure and comprehensive API surface for the gc module.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 30, 2026

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin gc-internal

@youknowone
Copy link
Member Author

cc @discord9

@youknowone youknowone force-pushed the gc-internal branch 3 times, most recently from 8052f5f to 5a8b225 Compare January 31, 2026 08:58
Add gc_state module with GcState, GcGeneration, GcDebugFlags, GcStats.
Replace gc module stubs with working API backed by gc_state.
Add gc_callbacks and gc_garbage to Context.
Add is_gc_tracked, gc_finalized, gc_get_referents to PyObject.
Collection is stubbed (returns 0) — actual algorithm to follow.
@youknowone youknowone marked this pull request as ready for review January 31, 2026 13:12
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@crates/stdlib/src/gc.rs`:
- Around line 189-196: get_referrers currently unconditionally returns an empty
list but the review warns against raising NotImplementedError; to match CPython
behavior, either implement basic GC-tracking or explicitly preserve the
empty-list semantics and document the limitation. Update the
get_referrers(FuncArgs, VirtualMachine) implementation to return
vm.ctx.new_list(vec![]) for non-discoverable referrers, add/adjust the doc
comment inside the function to state that only GC-tracked references are
discoverable and stack/C-extension references are not, and do not raise
NotImplementedError; alternatively, if you choose to implement minimal tracking,
modify the scanning logic in get_referrers to consult your GC-tracking
structures and collect referrers before falling back to vm.ctx.new_list(vec![]).

In `@crates/vm/src/gc_state.rs`:
- Around line 231-287: untrack_object currently never removes a deallocated
frozen object from permanent_objects, leaving stale pointers that break unfreeze
and get_freeze_count; update unsafe fn untrack_object to also acquire a write
lock on self.permanent_objects and remove the GcObjectPtr (gc_ptr) just like
tracked_objects and finalized_objects are handled, ensuring the permanent set
does not retain freed pointers (refer to GcObjectPtr, untrack_object,
permanent_objects, unfreeze, and get_freeze_count).

In `@crates/vm/src/object/core.rs`:
- Around line 902-919: is_gc_tracked currently returns capability
(vtable.trace.is_some() || dict.is_some()) instead of actual membership; add a
public query on GcState (e.g., GcState::is_tracked(&self, obj_id: ObjectId) or
similar) that checks the tracked_objects set used by
track_object/untrack_object, then change PyObject::is_gc_tracked to call that
GcState query rather than inspecting vtable.trace or dict; ensure you use the
object's unique identifier or reference the same key used in tracked_objects so
is_gc_tracked reflects real GC membership while gc_get_referents can remain
unchanged.
🧹 Nitpick comments (1)
crates/vm/src/gc_state.rs (1)

289-305: Finalize tracking has two sources of truth.

GcState::{is_finalized, mark_finalized} uses a HashSet, but PyObject::gc_finalized relies on per-object bits and stdlib gc.is_finalized uses that path. Consider consolidating on one mechanism to avoid divergence (e.g., expose set_gc_finalized as pub(crate) and set bits here, or drop the set).

Comment on lines +189 to 196
/// Return the list of objects that directly refer to any of the arguments.
#[pyfunction]
fn get_referrers(_args: FuncArgs, vm: &VirtualMachine) -> PyResult {
Err(vm.new_not_implemented_error(""))
fn get_referrers(args: FuncArgs, vm: &VirtualMachine) -> PyListRef {
// This is expensive: we need to scan all tracked objects
// For now, return an empty list (would need full object tracking to implement)
let _ = args;
vm.ctx.new_list(vec![])
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

CPython gc.get_referrers documentation behavior error handling

💡 Result:

gc.get_referrers(*objs) (CPython) is explicitly a debugging API with several important behavioral limitations:

  • What it returns: a list of objects that directly refer to any of the given objs, but it only finds referrers among GC-tracked containers (objects participating in cyclic-GC via tp_traverse). If an extension type holds references but does not support GC, it will not be reported. [1]
  • “Stale” referrers can appear: objects that have been dereferenced but are still part of uncollected cycles may show up; if you want “currently live” results, run gc.collect() first. [1]
  • Returned objects may be temporarily invalid: the docs warn that some returned referrers may still be under construction and in a temporarily invalid state—avoid doing anything other than inspection/debugging. [1]
  • It can miss real referrers: not all references in the runtime are represented as GC-tracked container links. For example, references held only on the VM evaluation stack (and some other internal/C-held references) may not be reported, so gc.get_referrers(x) can be empty even when x is obviously still reachable. [2]

Error handling / exceptions (what the docs imply)

  • The function fires an auditing event gc.get_referrers with argument objs; if you have auditing hooks / security policy enabled, that hook may raise and thus make get_referrers() fail. [1]
  • Otherwise, the standard docs do not specify special “semantic” errors (it’s a normal builtin; you can still get ordinary failures like MemoryError if allocation of the result list fails, or TypeError for incorrect calling convention). The main “gotcha” called out is invalid/unsafe returned objects, not a recoverable exception path. [1]

Sources: Python gc module docs (get_referrers) [1], explanation of missed referrers (VM stack / C-held refs) [2].

[1] (docs.python.org)
[2] (stackoverflow.com)

Citations:


🏁 Script executed:

# First, let's check the actual file to confirm the current state
cat -n crates/stdlib/src/gc.rs | head -220 | tail -40

Repository: RustPython/RustPython

Length of output: 1633


get_referrers returns an empty list, but raising NotImplementedError contradicts CPython behavior.

CPython's gc.get_referrers() does not raise NotImplementedError—it legitimately returns an empty list when no referrers are found (e.g., references on the VM stack or in non-GC-tracked C objects are not discoverable). The function is documented as a debugging API with inherent limitations, not as a "not implemented" operation.

The underlying concern is valid: the current implementation makes no attempt to find referrers and unconditionally returns an empty list. However, the suggested fix conflicts with CPython compatibility. Consider either:

  • Implementing at least basic tracking of GC-managed object references, or
  • Accepting the empty list as a known limitation while improving documentation
🤖 Prompt for AI Agents
In `@crates/stdlib/src/gc.rs` around lines 189 - 196, get_referrers currently
unconditionally returns an empty list but the review warns against raising
NotImplementedError; to match CPython behavior, either implement basic
GC-tracking or explicitly preserve the empty-list semantics and document the
limitation. Update the get_referrers(FuncArgs, VirtualMachine) implementation to
return vm.ctx.new_list(vec![]) for non-discoverable referrers, add/adjust the
doc comment inside the function to state that only GC-tracked references are
discoverable and stack/C-extension references are not, and do not raise
NotImplementedError; alternatively, if you choose to implement minimal tracking,
modify the scanning logic in get_referrers to consult your GC-tracking
structures and collect referrers before falling back to vm.ctx.new_list(vec![]).

Comment on lines +902 to +919
/// Check if this object is tracked by the garbage collector.
/// Returns true if the object has a trace function or has an instance dict.
pub fn is_gc_tracked(&self) -> bool {
if self.0.vtable.trace.is_some() {
return true;
}
self.0.dict.is_some()
}

/// Get the referents (objects directly referenced) of this object.
/// Uses the full traverse including dict and slots.
pub fn gc_get_referents(&self) -> Vec<PyObjectRef> {
let mut result = Vec::new();
self.0.traverse(&mut |child: &PyObject| {
result.push(child.to_owned());
});
result
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for GC tracking-related code
echo "=== Searching for TRACKED bit and tracking functions ==="
rg -n "GcBits::TRACKED|track_object|untrack_object|GcState" -g '*.rs' --max-count 30

Repository: RustPython/RustPython

Length of output: 1074


🏁 Script executed:

#!/bin/bash
# Examine the is_gc_tracked implementation and surrounding context
echo "=== Looking at core.rs to understand current implementation ==="
rg -B 5 -A 5 "fn is_gc_tracked" crates/vm/src/object/core.rs

Repository: RustPython/RustPython

Length of output: 485


🏁 Script executed:

#!/bin/bash
# Search for how objects are added/removed from GC tracking
echo "=== Looking for gc collection and tracking mechanisms ==="
fd -e rs | xargs rg -l "gc_collect|gc_track|GcBits" | head -20

Repository: RustPython/RustPython

Length of output: 175


🌐 Web query:

CPython gc.is_tracked implementation and behavior

💡 Result:

What gc.is_tracked(obj) reports (behavior)

  • gc.is_tracked(obj) returns whether this object instance is currently in CPython’s cyclic-GC tracking lists (not whether it is “GC-capable” in principle). Its result can change after later garbage collections due to container “untracking” optimizations. [1][2]
  • Objects that cannot participate in reference cycles (most “atomic” objects like int, str, etc.) are not tracked. Containers and user-defined objects typically are tracked, but some built-in containers may be untracked when they are proven to contain only non-tracked contents. [1][2]
  • This behavior is version- and type-optimization-dependent; e.g., dictionaries/tuples may toggle between tracked and untracked depending on their contents and what GC cycle has run. [1][2]

How CPython implements it (core idea)

At the C level, Python exposes PyObject_GC_IsTracked(op) (Stable ABI since 3.9), documented as “analogous to gc.is_tracked()”: it returns 1 only if the object both supports the GC protocol and is currently tracked. [3]

Internally, CPython’s fast check for “tracked” is essentially:

  • tracked ⇔ the object’s GC header indicates it’s linked into a GC list
  • concretely: _PyObject_GC_IS_TRACKED(op) returns true when the per-object GC header field _gc_next is non-zero. [4]
  • and when an object is untracked, CPython unlinks it from the GC list and sets gc->_gc_next = 0. [5]

Practical implications / “gotchas”

  • gc.is_tracked(x) == False does not necessarily mean “cannot ever be tracked”: many container objects can be tracked/untracked over time depending on contents and GC passes. [2][4]
  • If you’re writing/inspecting C extensions: “GC-capable” is PyObject_IS_GC(obj), while “currently tracked” is PyObject_GC_IsTracked(obj) / _PyObject_GC_IS_TRACKED(obj). [3][4]

Sources: Python gc docs and GC design notes [1][2], CPython C-API GC support docs [3], internal header/coverage view showing _PyObject_GC_IS_TRACKED [4], and internal untrack implementation showing _gc_next = 0 [5].

[1] https://docs.python.org/3.12/library/gc.html
[2] https://daobook.github.io/devguide/garbage_collector.html
[3] https://docs.python.org/3/c-api/gcsupport.html
[4] https://cheimes.fedorapeople.org/cpython/lcov-report/Include/internal/pycore_gc.h.gcov.html
[5] https://android.googlesource.com/platform/prebuilts/python/linux-x86/+/5e62e45e850184b4a88ada6467f73f39f8b44609/include/python3.9/internal/pycore_object.h


🏁 Script executed:

#!/bin/bash
# Look at track_object and untrack_object implementations
sed -n '236,280p' crates/vm/src/gc_state.rs

Repository: RustPython/RustPython

Length of output: 1872


🏁 Script executed:

#!/bin/bash
# Look for what data structures hold tracked objects
sed -n '99,160p' crates/vm/src/gc_state.rs

Repository: RustPython/RustPython

Length of output: 2550


🏁 Script executed:

#!/bin/bash
# Search for how is_gc_tracked is currently used in the codebase
rg "is_gc_tracked" -g '*.rs' -B 2 -A 2

Repository: RustPython/RustPython

Length of output: 751


🏁 Script executed:

#!/bin/bash
# Check the gc.rs stdlib module to see if it implements gc.is_tracked()
head -100 crates/stdlib/src/gc.rs

Repository: RustPython/RustPython

Length of output: 3103


🏁 Script executed:

#!/bin/bash
# Look for how GcState is accessed from PyObject methods
rg "gc_state()" crates/vm/src/object/core.rs -B 3 -A 3

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Check if there are any methods to query if an object is in tracked_objects
rg "tracked_objects" crates/vm/src/gc_state.rs -B 2 -A 2

Repository: RustPython/RustPython

Length of output: 1090


🏁 Script executed:

#!/bin/bash
# Look at the full GcState struct definition and public methods
sed -n '99,250p' crates/vm/src/gc_state.rs

Repository: RustPython/RustPython

Length of output: 5303


🏁 Script executed:

#!/bin/bash
# Check if there's a way to query tracking state
rg "pub.*fn.*is_tracked|is_in_gc" crates/vm/src/gc_state.rs

Repository: RustPython/RustPython

Length of output: 47


is_gc_tracked() incorrectly reports GC-capability rather than actual tracking state.

The method returns true based on trace.is_some() or dict.is_some(), which indicates the object can be tracked, not whether it currently is tracked. GcState maintains actual tracking membership in tracked_objects (populated/cleared by track_object/untrack_object), but this state is private and inaccessible from is_gc_tracked(). This diverges from CPython's gc.is_tracked(), which reports actual GC list membership.

To fix: expose a public query method on GcState to check tracking membership, and have is_gc_tracked() consult it instead of relying on static vtable/dict presence.

🤖 Prompt for AI Agents
In `@crates/vm/src/object/core.rs` around lines 902 - 919, is_gc_tracked currently
returns capability (vtable.trace.is_some() || dict.is_some()) instead of actual
membership; add a public query on GcState (e.g., GcState::is_tracked(&self,
obj_id: ObjectId) or similar) that checks the tracked_objects set used by
track_object/untrack_object, then change PyObject::is_gc_tracked to call that
GcState query rather than inspecting vtable.trace or dict; ensure you use the
object's unique identifier or reference the same key used in tracked_objects so
is_gc_tracked reflects real GC membership while gc_get_referents can remain
unchanged.

Copy link
Contributor

@ShaharNaveh ShaharNaveh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@github-actions
Copy link
Contributor

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[ ] lib: cpython/Lib/asyncio
[ ] test: cpython/Lib/test/test_asyncio (TODO: 14)

dependencies:

  • asyncio (native: _overlapped, _pyrepl.console, _remote_debugging, _winapi, asyncio.tools, collections.abc, concurrent.futures, errno, itertools, math, msvcrt, sys, time)
    • ast (native: _ast, sys)
    • collections (native: _weakref, itertools, sys)
    • dataclasses (native: itertools, sys)
    • enum (native: builtins, sys)
    • io (native: _io, _thread, errno, sys)
    • logging (native: atexit, errno, logging.handlers, sys, time)
    • os (native: os.path, sys)
    • site (native: _io, builtins, errno, sys)
    • socket (native: _socket, sys)
    • subprocess (native: builtins, errno, sys, time)
    • tempfile (native: _thread, errno, sys)
    • traceback (native: collections.abc, itertools, sys)
    • _colorize, argparse, contextlib, contextvars, functools, heapq, inspect, linecache, reprlib, selectors, signal, stat, struct, threading, types, warnings, weakref

dependent tests: (8 tests)

  • asyncio: test_asyncio test_builtin test_contextlib_async test_inspect test_logging test_os test_sys_settrace test_unittest

[ ] test: cpython/Lib/test/test_dict.py (TODO: 3)
[ ] test: cpython/Lib/test/test_dictcomps.py (TODO: 1)
[ ] test: cpython/Lib/test/test_dictviews.py (TODO: 2)
[ ] test: cpython/Lib/test/test_userdict.py

dependencies:

dependent tests: (no tests depend on dict)

[ ] test: cpython/Lib/test/test_generators.py (TODO: 5)
[ ] test: cpython/Lib/test/test_genexps.py
[ ] test: cpython/Lib/test/test_generator_stop.py
[ ] test: cpython/Lib/test/test_yield_from.py (TODO: 10)

dependencies:

dependent tests: (no tests depend on generator)

[ ] lib: cpython/Lib/subprocess.py
[ ] test: cpython/Lib/test/test_subprocess.py (TODO: 9)

dependencies:

  • subprocess

dependent tests: (51 tests)

  • subprocess: test_android test_asyncio test_audit test_bz2 test_c_locale_coercion test_cmd_line test_cmd_line_script test_ctypes test_dtrace test_faulthandler test_file_eintr test_gzip test_inspect test_json test_msvcrt test_ntpath test_os test_platform test_plistlib test_poll test_py_compile test_regrtest test_repl test_runpy test_script_helper test_shutil test_signal test_site test_sqlite3 test_subprocess test_support test_sys test_sysconfig test_tempfile test_threading test_unittest test_urllib2 test_utf8_mode test_venv test_wait3 test_webbrowser test_zipfile
    • asyncio: test_asyncio test_builtin test_contextlib_async test_logging test_sys_settrace test_unittest
    • ctypes.util: test_ctypes
    • ensurepip: test_ensurepip
    • multiprocessing.util: test_concurrent_futures

[x] lib: cpython/Lib/weakref.py
[x] lib: cpython/Lib/_weakrefset.py
[x] test: cpython/Lib/test/test_weakref.py (TODO: 23)

dependencies:

  • weakref

dependent tests: (146 tests)

  • weakref: test_array test_ast test_asyncio test_code test_concurrent_futures test_context test_contextlib test_copy test_deque test_descr test_dict test_exceptions test_file test_fileio test_functools test_generators test_genericalias test_importlib test_inspect test_io test_ipaddress test_itertools test_logging test_memoryview test_mmap test_ordered_dict test_pickle test_picklebuffer test_queue test_re test_scope test_set test_slice test_socket test_sqlite3 test_ssl test_struct test_tempfile test_thread test_threading test_threading_local test_types test_typing test_unittest test_uuid test_weakref test_weakset test_xml_etree
    • asyncio: test_asyncio test_builtin test_contextlib_async test_os test_sys_settrace test_unittest
    • concurrent.futures.process: test_concurrent_futures
    • copy: test_bytes test_codecs test_collections test_csv test_defaultdict test_dictviews test_email test_enum test_fractions test_http_cookies test_opcache test_optparse test_platform test_plistlib test_posix test_site test_statistics test_sysconfig test_tomllib test_xml_dom_minicompat test_zlib
      • dataclasses: test__colorize test_patma test_pprint test_regrtest test_zoneinfo
      • email.generator: test_email
      • http.cookiejar: test_http_cookiejar
      • http.server: test_httpservers test_robotparser test_urllib2_localnet
      • mailbox: test_mailbox
      • smtplib: test_smtplib test_smtpnet
      • tarfile: test_shutil test_tarfile
    • inspect: test_abc test_argparse test_asyncgen test_decimal test_grammar test_ntpath test_operator test_posixpath test_signal test_traceback test_yield_from test_zipimport
      • bdb: test_bdb
      • importlib.metadata: test_importlib
      • rlcompleter: test_rlcompleter
      • trace: test_trace
      • xmlrpc.server: test_docxmlrpc test_xmlrpc
    • multiprocessing: test_concurrent_futures test_fcntl test_multiprocessing_main_handling
    • symtable: test_symtable
    • tempfile: test_bz2 test_cmd_line test_compile test_compileall test_dis test_doctest test_ensurepip test_faulthandler test_filecmp test_fileinput test_importlib test_linecache test_pathlib test_pkg test_pkgutil test_py_compile test_runpy test_selectors test_string_literals test_subprocess test_support test_tabnanny test_threadedtempfile test_urllib test_urllib2 test_urllib_response test_venv test_zipapp test_zipfile test_zipfile64 test_zstd
      • urllib.request: test_urllib2net test_urllibnet

[ ] test: cpython/Lib/test/test_weakset.py (TODO: 1)

dependencies:

dependent tests: (no tests depend on weakset)

[ ] lib: cpython/Lib/zoneinfo
[ ] test: cpython/Lib/test/test_zoneinfo (TODO: 3)

dependencies:

  • zoneinfo
    • datetime (native: _thread, math, sys, time)
    • sysconfig (native: os.path, sys)
    • collections, os
    • bisect, calendar, functools, re, struct, weakref

dependent tests: (3 tests)

  • zoneinfo: test_email test_plistlib test_zoneinfo

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@crates/vm/src/gc_state.rs`:
- Around line 236-252: In track_object, a poisoned write lock on
generation_objects[0] can cause the object to be inserted into tracked_objects
but not generation_objects (and counts not incremented), creating inconsistency
with get_objects or generation counts; fix by making the operation atomic:
either return a Result from track_object (propagate the lock error from
generation_objects[0].write()) or acquire/check both locks first and only insert
into tracked_objects if generation_objects[0].write() succeeded (i.e., perform
the generation_objects[0].write() and insert into gen0 and update
generations[0].count/alloc_count before touching tracked_objects), referencing
track_object, generation_objects, tracked_objects, generations, alloc_count and
get_objects so the change is applied where those symbols are used.
- Around line 391-429: The freeze() and unfreeze() functions drain sets into
local vectors then attempt to write the destination lock, which can fail and
silently drop objects; change them to handle lock errors defensively: after
draining a generation (generation_objects.write() in freeze) or draining
permanent (permanent_objects.write() in unfreeze), check the result of acquiring
the destination lock (permanent_objects.write() in freeze,
generation_objects[2].write() in unfreeze) and if it Err, re-acquire the
original source lock (or re-insert the drained pointers back into the original
generation/permanent) or else log a fatal error/panic so objects are not lost;
reference freeze, unfreeze, generation_objects, permanent_objects,
permanent.count, generations[*].count and ensure counts are only updated after
successful insertion into the destination set.

Comment on lines +236 to +252
pub unsafe fn track_object(&self, obj: NonNull<PyObject>) {
let gc_ptr = GcObjectPtr(obj);

// Add to generation 0 tracking first (for correct gc_refs algorithm)
// Only increment count if we successfully add to the set
if let Ok(mut gen0) = self.generation_objects[0].write()
&& gen0.insert(gc_ptr)
{
self.generations[0].count.fetch_add(1, Ordering::SeqCst);
self.alloc_count.fetch_add(1, Ordering::SeqCst);
}

// Also add to global tracking (for get_objects, etc.)
if let Ok(mut tracked) = self.tracked_objects.write() {
tracked.insert(gc_ptr);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential inconsistency if generation lock is poisoned.

If generation_objects[0].write() returns Err (poisoned lock), the object is skipped from generation tracking but still added to tracked_objects. This creates an inconsistent state where get_objects(None) would return the object but get_objects(Some(0)) would not, and the generation count would be wrong.

Consider either:

  1. Propagating the error to the caller, or
  2. Ensuring both operations succeed or neither does (transactional semantics).

Since this is interface-only and actual GC implementation will follow, this may be acceptable for now if documented.

🤖 Prompt for AI Agents
In `@crates/vm/src/gc_state.rs` around lines 236 - 252, In track_object, a
poisoned write lock on generation_objects[0] can cause the object to be inserted
into tracked_objects but not generation_objects (and counts not incremented),
creating inconsistency with get_objects or generation counts; fix by making the
operation atomic: either return a Result from track_object (propagate the lock
error from generation_objects[0].write()) or acquire/check both locks first and
only insert into tracked_objects if generation_objects[0].write() succeeded
(i.e., perform the generation_objects[0].write() and insert into gen0 and update
generations[0].count/alloc_count before touching tracked_objects), referencing
track_object, generation_objects, tracked_objects, generations, alloc_count and
get_objects so the change is applied where those symbols are used.

Comment on lines +391 to +429
pub fn freeze(&self) {
// Move all objects from gen0-2 to permanent
let mut objects_to_freeze: Vec<GcObjectPtr> = Vec::new();

for (gen_idx, generation) in self.generation_objects.iter().enumerate() {
if let Ok(mut gen_set) = generation.write() {
objects_to_freeze.extend(gen_set.drain());
self.generations[gen_idx].count.store(0, Ordering::SeqCst);
}
}

// Add to permanent set
if let Ok(mut permanent) = self.permanent_objects.write() {
let count = objects_to_freeze.len();
for ptr in objects_to_freeze {
permanent.insert(ptr);
}
self.permanent.count.fetch_add(count, Ordering::SeqCst);
}
}

/// Unfreeze all objects (move from permanent to gen2)
pub fn unfreeze(&self) {
let mut objects_to_unfreeze: Vec<GcObjectPtr> = Vec::new();

if let Ok(mut permanent) = self.permanent_objects.write() {
objects_to_unfreeze.extend(permanent.drain());
self.permanent.count.store(0, Ordering::SeqCst);
}

// Add to generation 2
if let Ok(mut gen2) = self.generation_objects[2].write() {
let count = objects_to_unfreeze.len();
for ptr in objects_to_unfreeze {
gen2.insert(ptr);
}
self.generations[2].count.fetch_add(count, Ordering::SeqCst);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Objects can be lost if lock acquisition fails after draining.

In freeze(), if generation_objects are successfully drained but permanent_objects.write() returns Err (poisoned), the objects in objects_to_freeze are dropped without being added to any tracking set. Similarly in unfreeze().

Consider logging a warning or panic on lock failure here, since losing GC-tracked objects is a serious error condition.

Potential defensive handling
         // Add to permanent set
-        if let Ok(mut permanent) = self.permanent_objects.write() {
+        let mut permanent = self.permanent_objects.write()
+            .expect("permanent_objects lock poisoned during freeze");
             let count = objects_to_freeze.len();
             for ptr in objects_to_freeze {
                 permanent.insert(ptr);
             }
             self.permanent.count.fetch_add(count, Ordering::SeqCst);
-        }
🤖 Prompt for AI Agents
In `@crates/vm/src/gc_state.rs` around lines 391 - 429, The freeze() and
unfreeze() functions drain sets into local vectors then attempt to write the
destination lock, which can fail and silently drop objects; change them to
handle lock errors defensively: after draining a generation
(generation_objects.write() in freeze) or draining permanent
(permanent_objects.write() in unfreeze), check the result of acquiring the
destination lock (permanent_objects.write() in freeze,
generation_objects[2].write() in unfreeze) and if it Err, re-acquire the
original source lock (or re-insert the drained pointers back into the original
generation/permanent) or else log a fatal error/panic so objects are not lost;
reference freeze, unfreeze, generation_objects, permanent_objects,
permanent.count, generations[*].count and ensure counts are only updated after
successful insertion into the destination set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants