gh-140797: Forbid capturing groups in re.Scanner lexicon patterns#140944
gh-140797: Forbid capturing groups in re.Scanner lexicon patterns#140944serhiy-storchaka merged 4 commits intopython:mainfrom
Conversation
12bf67b to
e9e76d0
Compare
e9e76d0 to
12bf67b
Compare
It adds validation to re.Scanner.init that rejects lexicon patterns containing capturing groups. If a user-supplied pattern contains any capturing groups, Scanner now raises ValueError with a clear message advising the use of non-capturing groups (?:...) instead.
0ebfddd to
29db6ca
Compare
wjssz
left a comment
There was a problem hiding this comment.
After you update the patch, you may ask core developer Serhiy to review.
f1d2c8b to
81f3675
Compare
wjssz
left a comment
There was a problem hiding this comment.
It seems that you are a beginner, please be patient.
After my review is completed, then ask core developer Serhiy to review.
Lib/re/__init__.py
Outdated
| for phrase, action in lexicon: | ||
| sub_pattern = _parser.parse(phrase, flags) | ||
| if sub_pattern.state.groups != 1: | ||
| raise ValueError( |
There was a problem hiding this comment.
You can write in one line. A line should <= 80 characters.
raise ValueError("Can not use capturing groups in re.Scanner.")
There was a problem hiding this comment.
Thank you for the suggestion. I have resolved it now
Lib/test/test_re.py
Outdated
|
|
||
| #Capturing group throws an error | ||
| lex = [("(a)b", None)] | ||
| with self.assertRaises(ValueError): |
There was a problem hiding this comment.
You may check exception message.
msg = "Can not use capturing groups in re.Scanner"
with self.assertRaisesRegex(ValueError, msg):
...There was a problem hiding this comment.
Thank you for the suggestion! I have resolved it now. Need to learn a lot!
81f3675 to
401969c
Compare
Lib/test/test_re.py
Outdated
| 'op+', 'bar'], '')) | ||
|
|
||
| def test_bug_140797(self): | ||
| #bug 140797: remove capturing groups compilation form re.Scanner |
There was a problem hiding this comment.
Please add a space after # in comments.
- #Capturing group throws an error
+ # Capturing group throws an errorAnd add a space after , in functions arguments.
- with self.assertRaisesRegex(ValueError,msg):
+ with self.assertRaisesRegex(ValueError, msg):Then looks good to me.
There was a problem hiding this comment.
Oops! Thank you again! Resolved
401969c to
db0ea4a
Compare
Lib/test/test_re.py
Outdated
| (['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, | ||
| 'op+', 'bar'], '')) | ||
|
|
||
| def test_bug_140797(self): |
There was a problem hiding this comment.
My negligence, please use test_bug_gh140797 as the name.
If no "gh", It may refer to the previous bug tracker.
Sorry for this.
- # bug 140797: remove capturing groups compilation form re.Scanner
+ # gh140797: capturing groups is not allowed in re.ScannerThere was a problem hiding this comment.
Done! Thank you again for your time and suggestions!
db0ea4a to
0f9a934
Compare
Lib/test/test_re.py
Outdated
| # Capturing group throws an error | ||
| lex = [("(a)b", None)] | ||
| with self.assertRaisesRegex(ValueError, msg): | ||
| Scanner(lex) |
There was a problem hiding this comment.
This saves a line...
My fault again.
- Scanner(lex)
+ Scanner([("(a)b", None)])There was a problem hiding this comment.
Done! Testing sure takes the time 😂
0f9a934 to
0197f65
Compare
|
I have checked basic problems, @serhiy-storchaka please review. |
Lib/re/__init__.py
Outdated
| for phrase, action in lexicon: | ||
| sub_pattern = _parser.parse(phrase, flags) | ||
| if sub_pattern.state.groups != 1: | ||
| raise ValueError("Can not use capturing groups in re.Scanner") |
There was a problem hiding this comment.
"Cannot". This is the most commonly used variant.
| @@ -0,0 +1,4 @@ | |||
| The re.Scanner class now forbids regular expressions containing capturing | |||
There was a problem hiding this comment.
Mention that that class is undocumented. You can also use some formatting, even if the link does not work: :class:`!re.Scanner`.
0197f65 to
feaee4e
Compare
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Two more nitpicks and LGTM. 👍
Co-authored-by: Serhiy Storchaka <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
|
Thanks @Abhi210 for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14. |
|
Thanks @Abhi210 for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13. |
…ns (pythonGH-140944) (cherry picked from commit fa9c3ee) Co-authored-by: Abhishek Tiwari <[email protected]>
|
GH-140982 is a backport of this pull request to the 3.14 branch. |
…ns (pythonGH-140944) (cherry picked from commit fa9c3ee) Co-authored-by: Abhishek Tiwari <[email protected]>
|
GH-140983 is a backport of this pull request to the 3.13 branch. |
…rns (GH-140944) (GH-140983) (cherry picked from commit fa9c3ee) Co-authored-by: Abhishek Tiwari <[email protected]>
…rns (GH-140944) (GH-140982) (cherry picked from commit fa9c3ee) Co-authored-by: Abhishek Tiwari <[email protected]>
3.14 changed behavior of re.Scanner() capture groups, breaking cmake-format. Change was also backported to 3.13. python/cpython#140944
… lexicon patterns (pythonGH-140944) (pythonGH-140983)" This reverts commit ee894d2.
… lexicon patterns (pythonGH-140944) (pythonGH-140983)" This reverts commit ee894d2.
… lexicon patterns (pythonGH-140944) (pythonGH-140982)" This reverts commit e5266fc.
Required as of Python 3.15. See: python/cpython#140944
Required as of Python 3.15. See: python/cpython#140944
Required as of Python 3.15. See: python/cpython#140944
Required as of Python 3.15. See: python/cpython#140944
Required as of Python 3.15. See: python/cpython#140944
This PR adds validation to
re.Scanner.__init__that rejects lexicon patterns containing capturing groups. If a user-supplied pattern contains any capturing groups, Scanner now raisesValueErrorwith a clear message advising the use of non-capturing groups (?:...) instead. Further, tests were added to assertValueErrorfor lexicons containing capturing groups and a passing test for non-capturing group.re.Scanner#140797