Conversation
📝 WalkthroughWalkthroughWindows-specific codecs and extensive Windows stdlib changes were added (code-page encode/decode, codec delegation), large rewrites to nt.rs (symlink, path normalization, reparse-point and attribute APIs), a tiny exceptions newline-trim tweak, and tighter env-var validation. Changes
Sequence Diagram(s)sequenceDiagram
participant Py as Python caller
participant VM as VirtualMachine/_codecs
participant Sub as _codecs_windows (submodule)
participant Win as Windows API
Py->>VM: call codec function (e.g., code_page_encode)
alt on non-Windows
VM->>VM: delegate_pycodecs! -> route to registered PyCodecs
VM-->>Py: return result
else on Windows
VM->>Sub: call Windows-specific wrapper
Sub->>Win: WideCharToMultiByte / MultiByteToWideChar
Win-->>Sub: bytes/string + status
Sub-->>VM: result
VM-->>Py: return result
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vm/src/stdlib/nt.rs (1)
2262-2294:⚠️ Potential issue | 🟠 MajorChange ValueError to OSError for non-symlink reparse tags.
readlinkshould raiseOSError(withEINVALerrno), notValueError, to match POSIX semantics when the reparse point is not a symlink or junction.The proposed fix references
ERROR_NOT_A_REPARSE_POINT, which does not exist inwindows_sys. Instead, uselibc::EINVAL:🛠️ Corrected error mapping
- } else { - return Err(vm.new_value_error("not a symbolic link".to_owned())); - }; + } else { + let err = io::Error::from_raw_os_error(libc::EINVAL); + return Err(OSErrorBuilder::with_filename(&err, path, vm)); + };
🤖 Fix all issues with AI agents
In `@crates/vm/src/stdlib/codecs.rs`:
- Around line 770-865: The decode currently falls back to non-strict decoding
whenever the first MultiByteToWideChar call returns 0; change code in the
code_page_decode path to check args.errors (the _errors variable) when
strict_flags uses MB_ERR_INVALID_CHARS: if size == 0 and _errors == "strict"
then return vm.new_unicode_decode_error(...) (use the same error formatting as
other branches) instead of falling back to call MultiByteToWideChar with flags
0; otherwise (when _errors != "strict") keep the existing fallback logic.
Reference symbols: args.errors / _errors, strict_flags, MB_ERR_INVALID_CHARS,
MultiByteToWideChar, vm.new_unicode_decode_error, and the code_page_decode
return paths.
In `@crates/vm/src/stdlib/nt.rs`:
- Around line 573-666: The disk_device check in _test_file_type_by_name
incorrectly omits network filesystems so regular files on UNC/share paths aren't
treated as disk devices; update the matches!(...) in _test_file_type_by_name to
include FILE_DEVICE_NETWORK_FILE_SYSTEM (import it from
windows_sys::Win32::System::Ioctl) alongside FILE_DEVICE_DISK,
FILE_DEVICE_VIRTUAL_DISK, and FILE_DEVICE_CD_ROM so _test_info(...) correctly
treats network-mounted regular files as disk devices and allows the rest of the
logic (including the handle-based fallback) to run.
- Around line 1287-1333: In _getfinalpathname, the CreateFileW call opens the
file with access 0 and share mode 0 which causes sharing violations; change the
access parameter to FILE_READ_ATTRIBUTES and the sharemode parameter to
FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE (matching other stdlib
patterns) when calling CreateFileW so the handle is opened permissively for
metadata reads; update the CreateFileW invocation in the _getfinalpathname
function accordingly and keep the rest of the logic
(GetFinalPathNameByHandleW/CloseHandle) unchanged.
🧹 Nitpick comments (1)
crates/vm/src/stdlib/nt.rs (1)
1585-1750: Add tests for the new Windows normpath implementation.This is a sizeable port; consider tests for UNC/device paths, drive-relative forms, dot/dotdot handling, trailing separators, and bytes input to guard parity.
| if args.code_page < 0 { | ||
| return Err(vm.new_value_error("invalid code page number".to_owned())); | ||
| } | ||
| let _errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); | ||
| let code_page = args.code_page as u32; | ||
| let data = args.data.borrow_buf(); | ||
| let len = data.len(); | ||
|
|
||
| if data.is_empty() { | ||
| return Ok((String::new(), 0)); | ||
| } | ||
|
|
||
| // Some code pages don't support MB_ERR_INVALID_CHARS | ||
| let strict_flags = if code_page == 65000 | ||
| || code_page == 42 | ||
| || (50220..=50222).contains(&code_page) | ||
| || code_page == 50225 | ||
| || code_page == 50227 | ||
| || code_page == 50229 | ||
| || (57002..=57011).contains(&code_page) | ||
| { | ||
| 0 | ||
| } else { | ||
| MB_ERR_INVALID_CHARS | ||
| }; | ||
|
|
||
| let size = unsafe { | ||
| MultiByteToWideChar( | ||
| code_page, | ||
| strict_flags, | ||
| data.as_ptr().cast(), | ||
| len as i32, | ||
| std::ptr::null_mut(), | ||
| 0, | ||
| ) | ||
| }; | ||
|
|
||
| if size == 0 { | ||
| let size = unsafe { | ||
| MultiByteToWideChar( | ||
| code_page, | ||
| 0, | ||
| data.as_ptr().cast(), | ||
| len as i32, | ||
| std::ptr::null_mut(), | ||
| 0, | ||
| ) | ||
| }; | ||
| if size == 0 { | ||
| let err = std::io::Error::last_os_error(); | ||
| return Err(vm.new_os_error(format!("code_page_decode failed: {err}"))); | ||
| } | ||
|
|
||
| let mut buffer = vec![0u16; size as usize]; | ||
| let result = unsafe { | ||
| MultiByteToWideChar( | ||
| code_page, | ||
| 0, | ||
| data.as_ptr().cast(), | ||
| len as i32, | ||
| buffer.as_mut_ptr(), | ||
| size, | ||
| ) | ||
| }; | ||
| if result == 0 { | ||
| let err = std::io::Error::last_os_error(); | ||
| return Err(vm.new_os_error(format!("code_page_decode failed: {err}"))); | ||
| } | ||
| buffer.truncate(result as usize); | ||
| let s = String::from_utf16(&buffer).map_err(|e| { | ||
| vm.new_unicode_decode_error(format!("code_page_decode failed: {e}")) | ||
| })?; | ||
| return Ok((s, len)); | ||
| } | ||
|
|
||
| let mut buffer = vec![0u16; size as usize]; | ||
| let result = unsafe { | ||
| MultiByteToWideChar( | ||
| code_page, | ||
| strict_flags, | ||
| data.as_ptr().cast(), | ||
| len as i32, | ||
| buffer.as_mut_ptr(), | ||
| size, | ||
| ) | ||
| }; | ||
| if result == 0 { | ||
| let err = std::io::Error::last_os_error(); | ||
| return Err(vm.new_os_error(format!("code_page_decode failed: {err}"))); | ||
| } | ||
| buffer.truncate(result as usize); | ||
| let s = String::from_utf16(&buffer) | ||
| .map_err(|e| vm.new_unicode_decode_error(format!("code_page_decode failed: {e}")))?; | ||
|
|
||
| Ok((s, len)) | ||
| } |
There was a problem hiding this comment.
errors="strict" isn’t enforced on decode failures.
When the strict MultiByteToWideChar pass returns 0, the code always falls back to replacement decoding, so invalid bytes won’t raise a UnicodeDecodeError. Gate the fallback on errors != "strict" (or raise a decode error) when MB_ERR_INVALID_CHARS is supported.
🛠️ Proposed fix
- let _errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict");
+ let errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict");
let code_page = args.code_page as u32;
let data = args.data.borrow_buf();
let len = data.len();
@@
let size = unsafe {
MultiByteToWideChar(
code_page,
strict_flags,
@@
)
};
if size == 0 {
+ if errors == "strict" && strict_flags != 0 {
+ return Err(vm.new_unicode_decode_error(format!(
+ "'cp{code_page}' codec can't decode bytes: invalid character"
+ )));
+ }
let size = unsafe {
MultiByteToWideChar(
code_page,
0,CPython code_page_decode strict behavior MultiByteToWideChar MB_ERR_INVALID_CHARS UnicodeDecodeError
🤖 Prompt for AI Agents
In `@crates/vm/src/stdlib/codecs.rs` around lines 770 - 865, The decode currently
falls back to non-strict decoding whenever the first MultiByteToWideChar call
returns 0; change code in the code_page_decode path to check args.errors (the
_errors variable) when strict_flags uses MB_ERR_INVALID_CHARS: if size == 0 and
_errors == "strict" then return vm.new_unicode_decode_error(...) (use the same
error formatting as other branches) instead of falling back to call
MultiByteToWideChar with flags 0; otherwise (when _errors != "strict") keep the
existing fallback logic. Reference symbols: args.errors / _errors, strict_flags,
MB_ERR_INVALID_CHARS, MultiByteToWideChar, vm.new_unicode_decode_error, and the
code_page_decode return paths.
| /// _testFileTypeByName - test file type by path name | ||
| fn _test_file_type_by_name(path: &std::path::Path, tested_type: u32) -> bool { | ||
| use crate::common::fileutils::windows::{ | ||
| FILE_INFO_BY_NAME_CLASS, get_file_information_by_name, | ||
| }; | ||
| use crate::common::windows::ToWideString; | ||
| use windows_sys::Win32::Foundation::{CloseHandle, INVALID_HANDLE_VALUE}; | ||
| use windows_sys::Win32::Storage::FileSystem::{ | ||
| CreateFileW, FILE_FLAG_BACKUP_SEMANTICS, FILE_FLAG_OPEN_REPARSE_POINT, | ||
| FILE_READ_ATTRIBUTES, FILE_SHARE_DELETE, FILE_SHARE_READ, FILE_SHARE_WRITE, | ||
| OPEN_EXISTING, | ||
| CreateFileW, FILE_ATTRIBUTE_REPARSE_POINT, FILE_FLAG_BACKUP_SEMANTICS, | ||
| FILE_FLAG_OPEN_REPARSE_POINT, FILE_READ_ATTRIBUTES, OPEN_EXISTING, | ||
| }; | ||
|
|
||
| // For islink/isjunction, use symlink_metadata to check reparse points | ||
| if (tested_type == PY_IFLNK || tested_type == PY_IFMNT) | ||
| && let Ok(meta) = path.symlink_metadata() | ||
| { | ||
| use std::os::windows::fs::MetadataExt; | ||
| let attrs = meta.file_attributes(); | ||
| use windows_sys::Win32::Storage::FileSystem::FILE_ATTRIBUTE_REPARSE_POINT; | ||
| if (attrs & FILE_ATTRIBUTE_REPARSE_POINT) == 0 { | ||
| return false; | ||
| use windows_sys::Win32::Storage::FileSystem::{FILE_DEVICE_CD_ROM, FILE_DEVICE_DISK}; | ||
| use windows_sys::Win32::System::Ioctl::FILE_DEVICE_VIRTUAL_DISK; | ||
|
|
||
| match get_file_information_by_name( | ||
| path.as_os_str(), | ||
| FILE_INFO_BY_NAME_CLASS::FileStatBasicByNameInfo, | ||
| ) { | ||
| Ok(info) => { | ||
| let disk_device = matches!( | ||
| info.DeviceType, | ||
| FILE_DEVICE_DISK | FILE_DEVICE_VIRTUAL_DISK | FILE_DEVICE_CD_ROM | ||
| ); | ||
| let result = _test_info( | ||
| info.FileAttributes, | ||
| info.ReparseTag, | ||
| disk_device, | ||
| tested_type, | ||
| ); | ||
| if !result | ||
| || (tested_type != PY_IFREG && tested_type != PY_IFDIR) | ||
| || (info.FileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) == 0 | ||
| { | ||
| return result; | ||
| } | ||
| } | ||
| Err(err) => { | ||
| if let Some(code) = err.raw_os_error() | ||
| && file_info_error_is_trustworthy(code as u32) | ||
| { | ||
| return false; | ||
| } | ||
| } | ||
| // Need to check reparse tag, fall through to CreateFileW | ||
| } | ||
|
|
||
| let wide_path = path.to_wide_with_nul(); | ||
|
|
||
| // For symlinks/junctions, add FILE_FLAG_OPEN_REPARSE_POINT to not follow | ||
| let mut flags = FILE_FLAG_BACKUP_SEMANTICS; | ||
| if tested_type != PY_IFREG && tested_type != PY_IFDIR { | ||
| flags |= FILE_FLAG_OPEN_REPARSE_POINT; | ||
| } | ||
|
|
||
| // Use sharing flags to avoid access denied errors | ||
| let handle = unsafe { | ||
| CreateFileW( | ||
| wide_path.as_ptr(), | ||
| FILE_READ_ATTRIBUTES, | ||
| FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, | ||
| 0, | ||
| core::ptr::null(), | ||
| OPEN_EXISTING, | ||
| flags, | ||
| std::ptr::null_mut(), | ||
| ) | ||
| }; | ||
|
|
||
| if handle == INVALID_HANDLE_VALUE { | ||
| // Fallback: try using Rust's metadata for isdir/isfile | ||
| if tested_type == PY_IFDIR { | ||
| return path.metadata().is_ok_and(|m| m.is_dir()); | ||
| } else if tested_type == PY_IFREG { | ||
| return path.metadata().is_ok_and(|m| m.is_file()); | ||
| } | ||
| // For symlinks/junctions, try without FILE_FLAG_BACKUP_SEMANTICS | ||
| let handle = unsafe { | ||
| CreateFileW( | ||
| wide_path.as_ptr(), | ||
| FILE_READ_ATTRIBUTES, | ||
| FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, | ||
| core::ptr::null(), | ||
| OPEN_EXISTING, | ||
| FILE_FLAG_OPEN_REPARSE_POINT, | ||
| std::ptr::null_mut(), | ||
| ) | ||
| }; | ||
| if handle == INVALID_HANDLE_VALUE { | ||
| return false; | ||
| } | ||
| let result = _test_file_type_by_handle(handle, tested_type, true); | ||
| if handle != INVALID_HANDLE_VALUE { | ||
| let result = _test_file_type_by_handle(handle, tested_type, false); | ||
| unsafe { CloseHandle(handle) }; | ||
| return result; | ||
| } | ||
|
|
||
| let result = _test_file_type_by_handle(handle, tested_type, true); | ||
| unsafe { CloseHandle(handle) }; | ||
| result | ||
| match unsafe { windows_sys::Win32::Foundation::GetLastError() } { | ||
| windows_sys::Win32::Foundation::ERROR_ACCESS_DENIED | ||
| | windows_sys::Win32::Foundation::ERROR_SHARING_VIOLATION | ||
| | windows_sys::Win32::Foundation::ERROR_CANT_ACCESS_FILE | ||
| | windows_sys::Win32::Foundation::ERROR_INVALID_PARAMETER => { | ||
| let stat = if tested_type == PY_IFREG || tested_type == PY_IFDIR { | ||
| crate::windows::win32_xstat(path.as_os_str(), true) | ||
| } else { | ||
| crate::windows::win32_xstat(path.as_os_str(), false) | ||
| }; | ||
| if let Ok(st) = stat { | ||
| let disk_device = (st.st_mode & libc::S_IFREG as u16) != 0; | ||
| return _test_info( | ||
| st.st_file_attributes, | ||
| st.st_reparse_tag, | ||
| disk_device, | ||
| tested_type, | ||
| ); | ||
| } | ||
| } | ||
| _ => {} | ||
| } | ||
|
|
||
| false | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the file structure and locate the relevant code
git ls-files "crates/vm/src/stdlib/nt.rs"Repository: RustPython/RustPython
Length of output: 91
🏁 Script executed:
# Check if FILE_DEVICE_NETWORK_FILE_SYSTEM constant exists in the codebase
rg "FILE_DEVICE_NETWORK_FILE_SYSTEM" --type rustRepository: RustPython/RustPython
Length of output: 153
🏁 Script executed:
# Find the _test_info function to understand how disk_device is used
rg "_test_info" --type rust -A 10 -B 2Repository: RustPython/RustPython
Length of output: 4357
🏁 Script executed:
# Check the Cargo.toml to see which version of windows_sys is used
rg "windows_sys" Cargo.tomlRepository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Search for windows device type constants used in the codebase
rg "FILE_DEVICE_" --type rust | head -30Repository: RustPython/RustPython
Length of output: 1896
🏁 Script executed:
# Get the full _test_info function to understand how disk_device affects PY_IFREG
sed -n '520,565p' crates/vm/src/stdlib/nt.rsRepository: RustPython/RustPython
Length of output: 1844
🏁 Script executed:
# Get more context around _test_info to see the complete function
sed -n '490,595p' crates/vm/src/stdlib/nt.rsRepository: RustPython/RustPython
Length of output: 4160
🏁 Script executed:
# Check how crates/common/src/fileutils.rs handles device types
rg "FILE_DEVICE_NETWORK_FILE_SYSTEM" -B 5 -A 5 crates/common/src/fileutils.rsRepository: RustPython/RustPython
Length of output: 560
🏁 Script executed:
# Get the complete _test_info function to see PY_IFREG logic
sed -n '450,490p' crates/vm/src/stdlib/nt.rsRepository: RustPython/RustPython
Length of output: 1943
🏁 Script executed:
# Check windows_sys for the FILE_DEVICE_NETWORK_FILE_SYSTEM constant availability
rg "FILE_DEVICE_NETWORK_FILE_SYSTEM" -t rustRepository: RustPython/RustPython
Length of output: 153
🏁 Script executed:
# Verify the constant path in windows_sys
python3 << 'EOF'
# Let's check if the constant is available and in what module
# From the fileutils.rs output, it's: Ioctl::FILE_DEVICE_NETWORK_FILE_SYSTEM
# We need to verify if it's in Win32::System::Ioctl
import subprocess
result = subprocess.run(['rg', 'Ioctl::FILE_DEVICE_NETWORK_FILE_SYSTEM', '-A', '2', '-B', '2', 'crates/common/src/fileutils.rs'],
capture_output=True, text=True)
print("Context around FILE_DEVICE_NETWORK_FILE_SYSTEM in fileutils.rs:")
print(result.stdout)
EOFRepository: RustPython/RustPython
Length of output: 369
🏁 Script executed:
# Verify the exact module path for FILE_DEVICE_NETWORK_FILE_SYSTEM in windows_sys
rg "use.*Ioctl" crates/common/src/fileutils.rs -B 5 | head -20Repository: RustPython/RustPython
Length of output: 268
🏁 Script executed:
# Verify the pattern used in fileutils.rs matches what was proposed in the diff
sed -n '1,50p' crates/common/src/fileutils.rs | grep -A 20 "use.*Ioctl"Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Check the imports in fileutils.rs to confirm the module path
sed -n '1,100p' crates/common/src/fileutils.rs | grep -E "use windows_sys"Repository: RustPython/RustPython
Length of output: 351
🏁 Script executed:
# Double-check the exact location where FILE_DEVICE_NETWORK_FILE_SYSTEM is used
rg "FILE_DEVICE_NETWORK_FILE_SYSTEM" -B 10 -A 2 crates/common/src/fileutils.rs | head -30Repository: RustPython/RustPython
Length of output: 651
Include FILE_DEVICE_NETWORK_FILE_SYSTEM in disk_device type checks for network file support.
Regular files on network shares (UNC paths, etc.) report DeviceType as FILE_DEVICE_NETWORK_FILE_SYSTEM, but the current disk_device check only includes FILE_DEVICE_DISK, FILE_DEVICE_VIRTUAL_DISK, and FILE_DEVICE_CD_ROM. This causes _test_info() to return false for PY_IFREG on network paths, triggering an early return without reaching the handle-based fallback. Add FILE_DEVICE_NETWORK_FILE_SYSTEM to the matches pattern (import from windows_sys::Win32::System::Ioctl), or consider relaxing the device type gating to ensure regular files on all mountable filesystem types are handled correctly.
🤖 Prompt for AI Agents
In `@crates/vm/src/stdlib/nt.rs` around lines 573 - 666, The disk_device check in
_test_file_type_by_name incorrectly omits network filesystems so regular files
on UNC/share paths aren't treated as disk devices; update the matches!(...) in
_test_file_type_by_name to include FILE_DEVICE_NETWORK_FILE_SYSTEM (import it
from windows_sys::Win32::System::Ioctl) alongside FILE_DEVICE_DISK,
FILE_DEVICE_VIRTUAL_DISK, and FILE_DEVICE_CD_ROM so _test_info(...) correctly
treats network-mounted regular files as disk devices and allows the rest of the
logic (including the handle-based fallback) to run.
| fn _getfinalpathname(path: OsPath, vm: &VirtualMachine) -> PyResult { | ||
| let real = path | ||
| .as_ref() | ||
| .canonicalize() | ||
| .map_err(|e| e.to_pyexception(vm))?; | ||
| Ok(path.mode().process_path(real, vm)) | ||
| use windows_sys::Win32::Storage::FileSystem::{ | ||
| CreateFileW, FILE_FLAG_BACKUP_SEMANTICS, GetFinalPathNameByHandleW, OPEN_EXISTING, | ||
| VOLUME_NAME_DOS, | ||
| }; | ||
|
|
||
| let wide = path.to_wide_cstring(vm)?; | ||
| let handle = unsafe { | ||
| CreateFileW( | ||
| wide.as_ptr(), | ||
| 0, | ||
| 0, | ||
| core::ptr::null(), | ||
| OPEN_EXISTING, | ||
| FILE_FLAG_BACKUP_SEMANTICS, | ||
| std::ptr::null_mut(), | ||
| ) | ||
| }; | ||
| if handle == INVALID_HANDLE_VALUE { | ||
| let err = io::Error::last_os_error(); | ||
| return Err(OSErrorBuilder::with_filename(&err, path, vm)); | ||
| } | ||
|
|
||
| let mut buffer: Vec<u16> = vec![0; Foundation::MAX_PATH as usize]; | ||
| let result = loop { | ||
| let ret = unsafe { | ||
| GetFinalPathNameByHandleW( | ||
| handle, | ||
| buffer.as_mut_ptr(), | ||
| buffer.len() as u32, | ||
| VOLUME_NAME_DOS, | ||
| ) | ||
| }; | ||
| if ret == 0 { | ||
| let err = io::Error::last_os_error(); | ||
| let _ = unsafe { Foundation::CloseHandle(handle) }; | ||
| return Err(OSErrorBuilder::with_filename(&err, path, vm)); | ||
| } | ||
| if (ret as usize) < buffer.len() { | ||
| let final_path = std::ffi::OsString::from_wide(&buffer[..ret as usize]); | ||
| break Ok(path.mode().process_path(final_path, vm)); | ||
| } | ||
| buffer.resize(ret as usize, 0); | ||
| }; | ||
|
|
||
| unsafe { Foundation::CloseHandle(handle) }; | ||
| result |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine the file
fd nt.rs crates/vm/src/stdlib/Repository: RustPython/RustPython
Length of output: 203
🏁 Script executed:
# Read the specific lines in question
sed -n '1280,1340p' crates/vm/src/stdlib/nt.rsRepository: RustPython/RustPython
Length of output: 2137
🏁 Script executed:
# Check the imports at the top of the file to understand the current setup
head -100 crates/vm/src/stdlib/nt.rs | grep -A 20 "use windows_sys"Repository: RustPython/RustPython
Length of output: 2278
🏁 Script executed:
# Check if the proposed constants exist in windows_sys by searching Rust files
rg "FILE_READ_ATTRIBUTES|FILE_SHARE_READ|FILE_SHARE_WRITE|FILE_SHARE_DELETE" --type rs | head -20Repository: RustPython/RustPython
Length of output: 91
🏁 Script executed:
# Check the windows_sys crate version and available exports
fd "Cargo.toml" | xargs grep -l "windows_sys" | head -5Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Look at how other functions in nt.rs handle file access
rg "CreateFileW" crates/vm/src/stdlib/nt.rs -A 10 -B 2Repository: RustPython/RustPython
Length of output: 6696
🌐 Web query:
GetFinalPathNameByHandleW access rights requirements FILE_READ_ATTRIBUTES Windows API best practices
💡 Result:
Minimal access rights (practical + “least privilege”)
For GetFinalPathNameByHandleW, the safest minimal access to request when opening the handle is:
dwDesiredAccess = FILE_READ_ATTRIBUTES(rather thanGENERIC_READ) so you can read name/metadata without requesting read-data access.FILE_READ_ATTRIBUTESis the specific right used to read file metadata. [3][4]
Many callers also succeed with
dwDesiredAccess = 0(query-only handle) when they only need metadata and want to avoid requestingGENERIC_READ; Microsoft documents that a zero access mask can still allow querying certain metadata/attributes. [2]
However, this is not guaranteed to work for every file system / redirector / scenario, soFILE_READ_ATTRIBUTESis the more reliable “minimal” choice.
Share mode / flags (best practices)
When opening the handle you’ll pass to GetFinalPathNameByHandleW:
- Prefer permissive sharing to avoid collisions with other opens:
dwShareMode = FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE - If opening a directory, include:
FILE_FLAG_BACKUP_SEMANTICS(required to open directory handles viaCreateFile). - If you only need the path and not I/O, avoid unnecessary rights (don’t use
GENERIC_READunless you actually need to read file data).
Avoiding ERROR_ACCESS_DENIED surprises (notably SMB)
If you request a normalized name (FILE_NAME_NORMALIZED, the default), Microsoft notes that on SMB shares the system may resolve and query each path component; if the caller lacks permission on any component, the call can fail with ERROR_ACCESS_DENIED. [1]
Mitigations:
- Use
FILE_NAME_OPENEDwhen you don’t strictly need normalization. [1] - Or ensure the security context has traverse/query permission for the full component chain on the share.
Recommended pattern (typical)
Open a handle “as lightly as possible” for path resolution:
CreateFileW(path, FILE_READ_ATTRIBUTES, FILE_SHARE_READ|FILE_SHARE_WRITE|FILE_SHARE_DELETE, …, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS /*if dir*/, NULL)- Then call
GetFinalPathNameByHandleW(handle, …, dwFlags)(chooseFILE_NAME_OPENEDvs normalized per your needs). [1]
Sources: Microsoft GetFinalPathNameByHandleW remarks/flags and SMB behavior [1]; CreateFile note about dwDesiredAccess = 0 for querying metadata/attributes [2]; definition of FILE_READ_ATTRIBUTES right [3][4].
Use permissive share mode to avoid sharing violations when opening handle for final path.
The handle is opened with share mode 0, which prevents other processes from opening the file. This causes ERROR_SHARING_VIOLATION failures even though only metadata is needed. Use FILE_READ_ATTRIBUTES for access rights and FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE for share mode, matching Windows best practices and existing patterns in the codebase.
🛠️ Suggested change to reduce sharing violations
use windows_sys::Win32::Storage::FileSystem::{
- CreateFileW, FILE_FLAG_BACKUP_SEMANTICS, GetFinalPathNameByHandleW, OPEN_EXISTING,
- VOLUME_NAME_DOS,
+ CreateFileW, FILE_FLAG_BACKUP_SEMANTICS, FILE_READ_ATTRIBUTES, FILE_SHARE_DELETE,
+ FILE_SHARE_READ, FILE_SHARE_WRITE, GetFinalPathNameByHandleW, OPEN_EXISTING,
+ VOLUME_NAME_DOS,
};
let wide = path.to_wide_cstring(vm)?;
let handle = unsafe {
CreateFileW(
wide.as_ptr(),
- 0,
- 0,
+ FILE_READ_ATTRIBUTES,
+ FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
core::ptr::null(),
OPEN_EXISTING,
FILE_FLAG_BACKUP_SEMANTICS,
std::ptr::null_mut(),
)
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn _getfinalpathname(path: OsPath, vm: &VirtualMachine) -> PyResult { | |
| let real = path | |
| .as_ref() | |
| .canonicalize() | |
| .map_err(|e| e.to_pyexception(vm))?; | |
| Ok(path.mode().process_path(real, vm)) | |
| use windows_sys::Win32::Storage::FileSystem::{ | |
| CreateFileW, FILE_FLAG_BACKUP_SEMANTICS, GetFinalPathNameByHandleW, OPEN_EXISTING, | |
| VOLUME_NAME_DOS, | |
| }; | |
| let wide = path.to_wide_cstring(vm)?; | |
| let handle = unsafe { | |
| CreateFileW( | |
| wide.as_ptr(), | |
| 0, | |
| 0, | |
| core::ptr::null(), | |
| OPEN_EXISTING, | |
| FILE_FLAG_BACKUP_SEMANTICS, | |
| std::ptr::null_mut(), | |
| ) | |
| }; | |
| if handle == INVALID_HANDLE_VALUE { | |
| let err = io::Error::last_os_error(); | |
| return Err(OSErrorBuilder::with_filename(&err, path, vm)); | |
| } | |
| let mut buffer: Vec<u16> = vec![0; Foundation::MAX_PATH as usize]; | |
| let result = loop { | |
| let ret = unsafe { | |
| GetFinalPathNameByHandleW( | |
| handle, | |
| buffer.as_mut_ptr(), | |
| buffer.len() as u32, | |
| VOLUME_NAME_DOS, | |
| ) | |
| }; | |
| if ret == 0 { | |
| let err = io::Error::last_os_error(); | |
| let _ = unsafe { Foundation::CloseHandle(handle) }; | |
| return Err(OSErrorBuilder::with_filename(&err, path, vm)); | |
| } | |
| if (ret as usize) < buffer.len() { | |
| let final_path = std::ffi::OsString::from_wide(&buffer[..ret as usize]); | |
| break Ok(path.mode().process_path(final_path, vm)); | |
| } | |
| buffer.resize(ret as usize, 0); | |
| }; | |
| unsafe { Foundation::CloseHandle(handle) }; | |
| result | |
| fn _getfinalpathname(path: OsPath, vm: &VirtualMachine) -> PyResult { | |
| use windows_sys::Win32::Storage::FileSystem::{ | |
| CreateFileW, FILE_FLAG_BACKUP_SEMANTICS, FILE_READ_ATTRIBUTES, FILE_SHARE_DELETE, | |
| FILE_SHARE_READ, FILE_SHARE_WRITE, GetFinalPathNameByHandleW, OPEN_EXISTING, | |
| VOLUME_NAME_DOS, | |
| }; | |
| let wide = path.to_wide_cstring(vm)?; | |
| let handle = unsafe { | |
| CreateFileW( | |
| wide.as_ptr(), | |
| FILE_READ_ATTRIBUTES, | |
| FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, | |
| core::ptr::null(), | |
| OPEN_EXISTING, | |
| FILE_FLAG_BACKUP_SEMANTICS, | |
| std::ptr::null_mut(), | |
| ) | |
| }; | |
| if handle == INVALID_HANDLE_VALUE { | |
| let err = io::Error::last_os_error(); | |
| return Err(OSErrorBuilder::with_filename(&err, path, vm)); | |
| } | |
| let mut buffer: Vec<u16> = vec![0; Foundation::MAX_PATH as usize]; | |
| let result = loop { | |
| let ret = unsafe { | |
| GetFinalPathNameByHandleW( | |
| handle, | |
| buffer.as_mut_ptr(), | |
| buffer.len() as u32, | |
| VOLUME_NAME_DOS, | |
| ) | |
| }; | |
| if ret == 0 { | |
| let err = io::Error::last_os_error(); | |
| let _ = unsafe { Foundation::CloseHandle(handle) }; | |
| return Err(OSErrorBuilder::with_filename(&err, path, vm)); | |
| } | |
| if (ret as usize) < buffer.len() { | |
| let final_path = std::ffi::OsString::from_wide(&buffer[..ret as usize]); | |
| break Ok(path.mode().process_path(final_path, vm)); | |
| } | |
| buffer.resize(ret as usize, 0); | |
| }; | |
| unsafe { Foundation::CloseHandle(handle) }; | |
| result |
🤖 Prompt for AI Agents
In `@crates/vm/src/stdlib/nt.rs` around lines 1287 - 1333, In _getfinalpathname,
the CreateFileW call opens the file with access 0 and share mode 0 which causes
sharing violations; change the access parameter to FILE_READ_ATTRIBUTES and the
sharemode parameter to FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE
(matching other stdlib patterns) when calling CreateFileW so the handle is
opened permissively for metadata reads; update the CreateFileW invocation in the
_getfinalpathname function accordingly and keep the rest of the logic
(GetFinalPathNameByHandleW/CloseHandle) unchanged.
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin nt |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/vm/src/stdlib/codecs.rs (2)
465-521:⚠️ Potential issue | 🟠 MajorHonor
errors="strict"inmbcs_decodebefore falling back.The
_errorsvariable is parsed but never used. Currently, when the strictMultiByteToWideCharpass withMB_ERR_INVALID_CHARSfails, the code unconditionally retries without the flag, allowing invalid bytes to pass through even in strict mode. In strict mode, this should raise an error instead.🛠️ Proposed fix
- let _errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); + let errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); let data = args.data.borrow_buf(); let len = data.len(); @@ -488,6 +488,11 @@ mod windows_codecs { if size == 0 { + if errors == "strict" { + return Err(vm.new_unicode_decode_error( + "'mbcs' codec can't decode bytes: invalid character".to_owned(), + )); + } // Try without MB_ERR_INVALID_CHARS for non-strict mode (replacement behavior)
667-723:⚠️ Potential issue | 🟠 MajorHonor
errors="strict"inoem_decodeand other decode functions.At line 667,
_errorsis parsed but never used. When the strictMultiByteToWideCharpass fails (size == 0), the code unconditionally retries withoutMB_ERR_INVALID_CHARS, bypassing strict error checking. This violates codec semantics:errors="strict"should raise an error on invalid bytes, not silently fall back to lenient decoding.The same issue exists in
mbcs_decode(line 465) andcode_page_decode(line 895). By contrast, the corresponding encode functions (oem_encode,code_page_encode) correctly use theerrorsparameter to control behavior in strict mode.🛠️ Proposed fix for oem_decode
- let _errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); + let errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); let data = args.data.borrow_buf(); let len = data.len(); if data.is_empty() { return Ok((String::new(), 0)); } // Get the required buffer size for UTF-16 let size = unsafe { MultiByteToWideChar( CP_OEMCP, MB_ERR_INVALID_CHARS, data.as_ptr().cast(), len as i32, std::ptr::null_mut(), 0, ) }; if size == 0 { + if errors == "strict" { + return Err(vm.new_unicode_decode_error( + "'oem' codec can't decode bytes: invalid character".to_owned(), + )); + } // Try without MB_ERR_INVALID_CHARS for non-strict mode (replacement behavior)Apply the same logic to
mbcs_decodeandcode_page_decode.
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [ ] lib: cpython/Lib/io.py dependencies:
dependent tests: (87 tests)
[ ] test: cpython/Lib/test/test_cmd_line_script.py (TODO: 16) dependencies: dependent tests: (no tests depend on cmd_line_script) [x] lib: cpython/Lib/codecs.py dependencies:
dependent tests: (105 tests)
[ ] lib: cpython/Lib/compileall.py dependencies:
dependent tests: (1 tests)
[ ] lib: cpython/Lib/concurrent dependencies:
dependent tests: (11 tests)
[x] test: cpython/Lib/test/test_exceptions.py (TODO: 27) dependencies: dependent tests: (no tests depend on exception) [x] lib: cpython/Lib/os.py dependencies:
dependent tests: (160 tests)
[x] test: cpython/Lib/test/test_posix.py (TODO: 4) dependencies: dependent tests: (no tests depend on posix) [x] lib: cpython/Lib/runpy.py dependencies:
dependent tests: (3 tests)
[ ] lib: cpython/Lib/test/support dependencies:
dependent tests: (no tests depend on support) [ ] lib: cpython/Lib/subprocess.py dependencies:
dependent tests: (52 tests)
[x] lib: cpython/Lib/weakref.py dependencies:
dependent tests: (148 tests)
[ ] lib: cpython/Lib/zipfile dependencies:
dependent tests: (8 tests)
Legend:
|
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/vm/src/stdlib/codecs.rs (2)
447-502:⚠️ Potential issue | 🟠 Major
errors="strict"isn't enforced on decode failures inmbcs_decode.The
_errorsvariable (line 447) is extracted but never used. WhenMultiByteToWideCharwithMB_ERR_INVALID_CHARSreturns 0, the code unconditionally falls back to replacement decoding (lines 468-502). This means invalid bytes won't raise aUnicodeDecodeErroreven whenerrors="strict".🛠️ Proposed fix
- let _errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); + let errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); let data = args.data.borrow_buf(); let len = data.len(); @@ if size == 0 { + if errors == "strict" { + return Err(vm.new_unicode_decode_error( + "'mbcs' codec can't decode bytes: invalid character".to_string(), + )); + } // Try without MB_ERR_INVALID_CHARS for non-strict mode (replacement behavior) let size = unsafe {
637-692:⚠️ Potential issue | 🟠 Major
errors="strict"isn't enforced on decode failures inoem_decode.Same issue as
mbcs_decode: the_errorsvariable (line 637) is unused, and the code unconditionally falls back to replacement decoding when the strict check fails. This violates Python's codec error handling semantics.🛠️ Proposed fix
- let _errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); + let errors = args.errors.as_ref().map(|s| s.as_str()).unwrap_or("strict"); let data = args.data.borrow_buf(); let len = data.len(); @@ if size == 0 { + if errors == "strict" { + return Err(vm.new_unicode_decode_error( + "'oem' codec can't decode bytes: invalid character".to_string(), + )); + } // Try without MB_ERR_INVALID_CHARS for non-strict mode (replacement behavior) let size = unsafe {
🧹 Nitpick comments (1)
crates/vm/src/stdlib/codecs.rs (1)
338-348: Redundant#[cfg(windows)]attributes inside already-gated module.The structs and functions inside
_codecs_windows(lines 338, 347, 428, 440, 528, 537, 618, 630, 718, 729, 828, 842) have#[cfg(windows)]attributes, but the enclosing module at line 332 is already gated by#[cfg(windows)]. These inner attributes are redundant and can be removed for cleaner code.
Summary by CodeRabbit
New Features
Bug Fixes