Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Jan 19, 2026

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved standard library path resolution, particularly on Windows with enhanced lib directory detection for both text-file and symlink-based paths
  • Refactor

    • Centralized standard library directory configuration through interpreter state management

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 19, 2026

📝 Walkthrough

Walkthrough

Introduces a new stdlib_dir field to the path configuration system, computed via platform-specific logic in getpath.rs, exposed through the sys module as _stdlib_dir, and wired during interpreter initialization to store the standard library location derived from either frozen or dynamic paths.

Changes

Cohort / File(s) Summary
Path configuration structure
crates/vm/src/vm/setting.rs
Added optional stdlib_dir field to Paths struct to store the standard library directory path
stdlib directory computation
crates/vm/src/getpath.rs
Introduced private calculate_stdlib_dir() helper with platform-specific logic; added Step 9 to init_path_config() to assign computed stdlib_dir with existence validation
Python API exposure
crates/vm/src/stdlib/sys.rs
Added _stdlib_dir attribute to sys module that returns vm.state.config.paths.stdlib_dir as a PyObjectRef
Interpreter initialization
src/interpreter.rs
Replaced vm.sys_module hack with PyRc-based state access; centralized stdlib_dir assignment for both frozen and dynamic stdlib path flows
Windows build configuration
crates/pylib/build.rs
Enhanced Lib path resolution to handle both symlinks and text files (git with/without symlink support); added canonicalization and \?\ prefix stripping

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested reviewers

  • coolreader18
  • arihant2math

Poem

🐰 A rabbit hops through stdlib ways,
Building paths through winding maze,
From frozen lib to dynamic ground,
stdlib_dir must now be found! 📚✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Add sys._stdlib_dir' directly reflects the main change: adding a new Python-accessible attribute _stdlib_dir to the sys module, which is the primary public-facing feature of this changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin stdlib_dir

@youknowone youknowone marked this pull request as ready for review January 19, 2026 14:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/pylib/build.rs`:
- Around line 14-31: The build script currently calls
canonicalize(...).to_str().unwrap(), which can panic on non‑UTF‑8 Windows paths;
change the emission to use a lossy conversion from the canonicalized Path (e.g.,
Path::to_string_lossy on the value returned by std::fs::canonicalize(&lib_path))
and print that into the cargo:rustc-env=win_lib_path value, and if
std::fs::canonicalize(&lib_path) returns an Err emit a visible build warning via
println!("cargo:warning=...") instead of silently skipping so missing
win_lib_path is obvious; update the code paths referencing canonicalized_path
and to_str().unwrap() accordingly.

Comment on lines +14 to +31
if cfg!(windows) {
// On Windows, the Lib entry can be either:
// 1. A text file containing the relative path (git without symlink support)
// 2. A proper symlink (git with symlink support)
// We handle both cases to resolve to the actual Lib directory.
let lib_path = if let Ok(real_path) = std::fs::read_to_string("Lib") {
// Case 1: Text file containing relative path
std::path::PathBuf::from(real_path.trim())
} else {
// Case 2: Symlink or directory - canonicalize directly
std::path::PathBuf::from("Lib")
};

if let Ok(canonicalized_path) = std::fs::canonicalize(&lib_path) {
// Strip the extended path prefix (\\?\) that canonicalize adds on Windows
let path_str = canonicalized_path.to_str().unwrap();
let path_str = path_str.strip_prefix(r"\\?\").unwrap_or(path_str);
println!("cargo:rustc-env=win_lib_path={path_str}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid panic on non‑UTF‑8 Windows paths.
to_str().unwrap() can panic on valid Windows paths containing non‑UTF‑8 sequences, breaking the build. Prefer a lossy conversion and consider warning when canonicalization fails so missing win_lib_path isn’t silent.

🔧 Proposed safer emission
-        if let Ok(canonicalized_path) = std::fs::canonicalize(&lib_path) {
-            // Strip the extended path prefix (\\?\) that canonicalize adds on Windows
-            let path_str = canonicalized_path.to_str().unwrap();
-            let path_str = path_str.strip_prefix(r"\\?\").unwrap_or(path_str);
-            println!("cargo:rustc-env=win_lib_path={path_str}");
-        }
+        if let Ok(canonicalized_path) = std::fs::canonicalize(&lib_path) {
+            // Strip the extended path prefix (\\?\) that canonicalize adds on Windows
+            let path_str = canonicalized_path.to_string_lossy();
+            let path_str = path_str.strip_prefix(r"\\?\").unwrap_or(&path_str);
+            println!("cargo:rustc-env=win_lib_path={path_str}");
+        } else {
+            println!("cargo:warning=failed to canonicalize Lib path: {:?}", lib_path);
+        }
🤖 Prompt for AI Agents
In `@crates/pylib/build.rs` around lines 14 - 31, The build script currently calls
canonicalize(...).to_str().unwrap(), which can panic on non‑UTF‑8 Windows paths;
change the emission to use a lossy conversion from the canonicalized Path (e.g.,
Path::to_string_lossy on the value returned by std::fs::canonicalize(&lib_path))
and print that into the cargo:rustc-env=win_lib_path value, and if
std::fs::canonicalize(&lib_path) returns an Err emit a visible build warning via
println!("cargo:warning=...") instead of silently skipping so missing
win_lib_path is obvious; update the code paths referencing canonicalized_path
and to_str().unwrap() accordingly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/codegen/src/compile.rs`:
- Around line 6983-6986: The DictMerge path currently uses the first mapping as
the accumulator which can mutate a caller-provided **kwargs dict; change
compile.rs so that when merging multiple mappings (the loop around emit!(self,
Instruction::DictMerge { index: 1 })), you first allocate a fresh empty
accumulator and merge each mapping into that fresh map instead of using the
first input as the target; specifically, before the for-loop that emits
Instruction::DictMerge, emit an instruction to create a new empty dict (so the
accumulator is new) and then merge all incoming `**` groups into that fresh
accumulator (update the code that emits Instruction::DictMerge and the
surrounding logic in the same block to use the new accumulator).
🧹 Nitpick comments (1)
crates/vm/src/frame.rs (1)

962-962: Inconsistent downcast pattern with DictUpdate.

DictUpdate (line 934) uses a safe downcast with .expect():

let dict = dict.downcast_ref::<PyDict>().expect("exact dict expected");

But DictMerge uses an unsafe downcast without a SAFETY comment explaining the invariant. Consider using the same safe pattern as DictUpdate for consistency and defensive coding, or at minimum add a SAFETY comment like the other unsafe downcasts in this file (e.g., ListAppend, ListExtend).

♻️ Suggested fix for consistency
-                let dict: &Py<PyDict> = unsafe { dict_ref.downcast_unchecked_ref() };
+                let dict: &Py<PyDict> = unsafe {
+                    // SAFETY: compiler guarantees correct type
+                    dict_ref.downcast_unchecked_ref()
+                };

Or use the safe pattern:

-                let dict: &Py<PyDict> = unsafe { dict_ref.downcast_unchecked_ref() };
+                let dict = dict_ref.downcast_ref::<PyDict>().expect("exact dict expected");

Comment on lines 6983 to 6986
// Merge all dicts: first dict is accumulator, merge rest into it
for _ in 1..size {
emit!(self, Instruction::DictMerge { index: 1 });
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid mutating user-provided **kwargs when merging.
DictMerge uses the first mapping as the accumulator. If the first group is **kwargs, this mutates the caller’s dict when multiple ** groups exist (observable vs CPython, which builds a fresh dict). Please ensure the accumulator is always a new map before merging any ** inputs.

🔧 Possible fix (merge into a fresh accumulator)
-        let mut size = 0;
-        let groupby = keywords.iter().chunk_by(|e| e.arg.is_none());
-        for (is_unpacking, sub_keywords) in &groupby {
-            if is_unpacking {
-                for keyword in sub_keywords {
-                    self.compile_expression(&keyword.value)?;
-                    size += 1;
-                }
-            } else {
-                let mut sub_size = 0;
-                for keyword in sub_keywords {
-                    if let Some(name) = &keyword.arg {
-                        self.emit_load_const(ConstantData::Str {
-                            value: name.as_str().into(),
-                        });
-                        self.compile_expression(&keyword.value)?;
-                        sub_size += 1;
-                    }
-                }
-                emit!(self, Instruction::BuildMap { size: sub_size });
-                size += 1;
-            }
-        }
-        if size > 1 {
-            // Merge all dicts: first dict is accumulator, merge rest into it
-            for _ in 1..size {
-                emit!(self, Instruction::DictMerge { index: 1 });
-            }
-        }
+        let mut have_accumulator = false;
+        let groupby = keywords.iter().chunk_by(|e| e.arg.is_none());
+        for (is_unpacking, sub_keywords) in &groupby {
+            if !have_accumulator {
+                emit!(self, Instruction::BuildMap { size: 0 }); // fresh accumulator
+                have_accumulator = true;
+            }
+            if is_unpacking {
+                for keyword in sub_keywords {
+                    self.compile_expression(&keyword.value)?;
+                    emit!(self, Instruction::DictMerge { index: 1 });
+                }
+            } else {
+                let mut sub_size = 0;
+                for keyword in sub_keywords {
+                    if let Some(name) = &keyword.arg {
+                        self.emit_load_const(ConstantData::Str {
+                            value: name.as_str().into(),
+                        });
+                        self.compile_expression(&keyword.value)?;
+                        sub_size += 1;
+                    }
+                }
+                emit!(self, Instruction::BuildMap { size: sub_size });
+                emit!(self, Instruction::DictMerge { index: 1 });
+            }
+        }
🤖 Prompt for AI Agents
In `@crates/codegen/src/compile.rs` around lines 6983 - 6986, The DictMerge path
currently uses the first mapping as the accumulator which can mutate a
caller-provided **kwargs dict; change compile.rs so that when merging multiple
mappings (the loop around emit!(self, Instruction::DictMerge { index: 1 })), you
first allocate a fresh empty accumulator and merge each mapping into that fresh
map instead of using the first input as the target; specifically, before the
for-loop that emits Instruction::DictMerge, emit an instruction to create a new
empty dict (so the accumulator is new) and then merge all incoming `**` groups
into that fresh accumulator (update the code that emits Instruction::DictMerge
and the surrounding logic in the same block to use the new accumulator).

@youknowone youknowone merged commit b76906a into RustPython:main Jan 19, 2026
13 checks passed
@youknowone youknowone deleted the stdlib_dir branch January 19, 2026 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant