Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 28 additions & 28 deletions Doc/howto/regex.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _regex-howto:

****************************
Regular Expression HOWTO
Regular expression HOWTO
****************************

:Author: A.M. Kuchling <amk@amk.ca>
Expand Down Expand Up @@ -47,7 +47,7 @@ Python code to do the processing; while Python code will be slower than an
elaborate regular expression, it will also probably be more understandable.


Simple Patterns
Simple patterns
===============

We'll start by learning about the simplest possible regular expressions. Since
Expand All @@ -59,7 +59,7 @@ expressions (deterministic and non-deterministic finite automata), you can refer
to almost any textbook on writing compilers.


Matching Characters
Matching characters
-------------------

Most letters and characters will simply match themselves. For example, the
Expand Down Expand Up @@ -159,7 +159,7 @@ match even a newline. ``.`` is often used where you want to match "any
character".


Repeating Things
Repeating things
----------------

Being able to match varying sets of characters is the first thing regular
Expand Down Expand Up @@ -210,7 +210,7 @@ this RE against the string ``'abcbd'``.
| | | ``[bcd]*`` is only matching |
| | | ``bc``. |
+------+-----------+---------------------------------+
| 6 | ``abcb`` | Try ``b`` again. This time |
| 7 | ``abcb`` | Try ``b`` again. This time |
| | | the character at the |
| | | current position is ``'b'``, so |
| | | it succeeds. |
Expand Down Expand Up @@ -255,7 +255,7 @@ is equivalent to ``+``, and ``{0,1}`` is the same as ``?``. It's better to use
to read.


Using Regular Expressions
Using regular expressions
=========================

Now that we've looked at some simple regular expressions, how do we actually use
Expand All @@ -264,7 +264,7 @@ expression engine, allowing you to compile REs into objects and then perform
matches with them.


Compiling Regular Expressions
Compiling regular expressions
-----------------------------

Regular expressions are compiled into pattern objects, which have
Expand Down Expand Up @@ -295,7 +295,7 @@ disadvantage which is the topic of the next section.

.. _the-backslash-plague:

The Backslash Plague
The backslash plague
--------------------

As stated earlier, regular expressions use the backslash character (``'\'``) to
Expand Down Expand Up @@ -335,7 +335,7 @@ expressions will often be written in Python code using this raw string notation.

In addition, special escape sequences that are valid in regular expressions,
but not valid as Python string literals, now result in a
:exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`,
:exc:`SyntaxWarning` and will eventually become a :exc:`SyntaxError`,
which means the sequences will be invalid if raw string notation or escaping
the backslashes isn't used.

Expand All @@ -351,7 +351,7 @@ the backslashes isn't used.
+-------------------+------------------+


Performing Matches
Performing matches
------------------

Once you have an object representing a compiled regular expression, what do you
Expand All @@ -369,10 +369,10 @@ for a complete listing.
| | location where this RE matches. |
+------------------+-----------------------------------------------+
| ``findall()`` | Find all substrings where the RE matches, and |
| | returns them as a list. |
| | return them as a list. |
+------------------+-----------------------------------------------+
| ``finditer()`` | Find all substrings where the RE matches, and |
| | returns them as an :term:`iterator`. |
| | return them as an :term:`iterator`. |
+------------------+-----------------------------------------------+

:meth:`~re.Pattern.match` and :meth:`~re.Pattern.search` return ``None`` if no match can be found. If
Expand Down Expand Up @@ -473,7 +473,7 @@ Two pattern methods return all of the matches for a pattern.
The ``r`` prefix, making the literal a raw string literal, is needed in this
example because escape sequences in a normal "cooked" string literal that are
not recognized by Python, as opposed to regular expressions, now result in a
:exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`. See
:exc:`SyntaxWarning` and will eventually become a :exc:`SyntaxError`. See
:ref:`the-backslash-plague`.

:meth:`~re.Pattern.findall` has to create the entire list before it can be returned as the
Expand All @@ -491,7 +491,7 @@ result. The :meth:`~re.Pattern.finditer` method returns a sequence of
(29, 31)


Module-Level Functions
Module-level functions
----------------------

You don't have to create a pattern object and call its methods; the
Expand All @@ -518,7 +518,7 @@ Outside of loops, there's not much difference thanks to the internal
cache.


Compilation Flags
Compilation flags
-----------------

.. currentmodule:: re
Expand Down Expand Up @@ -642,7 +642,7 @@ of each one.
whitespace is in a character class or preceded by an unescaped backslash; this
lets you organize and indent the RE more clearly. This flag also lets you put
comments within a RE that will be ignored by the engine; comments are marked by
a ``'#'`` that's neither in a character class or preceded by an unescaped
a ``'#'`` that's neither in a character class nor preceded by an unescaped
backslash.

For example, here's a RE that uses :const:`re.VERBOSE`; see how much easier it
Expand All @@ -669,7 +669,7 @@ of each one.
to understand than the version using :const:`re.VERBOSE`.


More Pattern Power
More pattern power
==================

So far we've only covered a part of the features of regular expressions. In
Expand All @@ -679,7 +679,7 @@ retrieve portions of the text that was matched.

.. _more-metacharacters:

More Metacharacters
More metacharacters
-------------------

There are some metacharacters that we haven't covered yet. Most of them will be
Expand Down Expand Up @@ -875,7 +875,7 @@ Backreferences like this aren't often useful for just searching through a string
find out that they're *very* useful when performing string substitutions.


Non-capturing and Named Groups
Non-capturing and named groups
------------------------------

Elaborate REs may use many groups, both to capture substrings of interest, and
Expand Down Expand Up @@ -979,7 +979,7 @@ current point. The regular expression for finding doubled words,
'the the'


Lookahead Assertions
Lookahead assertions
--------------------

Another zero-width assertion is the lookahead assertion. Lookahead assertions
Expand Down Expand Up @@ -1061,7 +1061,7 @@ end in either ``bat`` or ``exe``:
``.*[.](?!bat$|exe$)[^.]*$``


Modifying Strings
Modifying strings
=================

Up to this point, we've simply performed searches against a static string.
Expand All @@ -1083,7 +1083,7 @@ using the following pattern methods:
+------------------+-----------------------------------------------+


Splitting Strings
Splitting strings
-----------------

The :meth:`~re.Pattern.split` method of a pattern splits a string apart
Expand Down Expand Up @@ -1137,7 +1137,7 @@ argument, but is otherwise the same. ::
['Words', 'words, words.']


Search and Replace
Search and replace
------------------

Another common task is to find all the matches for a pattern, and replace them
Expand Down Expand Up @@ -1236,15 +1236,15 @@ pattern object as the first parameter, or use embedded modifiers in the
pattern string, e.g. ``sub("(?i)b+", "x", "bbbb BBBB")`` returns ``'x x'``.


Common Problems
Common problems
===============

Regular expressions are a powerful tool for some applications, but in some ways
their behaviour isn't intuitive and at times they don't behave the way you may
expect them to. This section will point out some of the most common pitfalls.


Use String Methods
Use string methods
------------------

Sometimes using the :mod:`re` module is a mistake. If you're matching a fixed
Expand Down Expand Up @@ -1310,7 +1310,7 @@ string and then backtracking to find a match for the rest of the RE. Use
:func:`re.search` instead.


Greedy versus Non-Greedy
Greedy versus non-greedy
------------------------

When repeating a regular expression, as in ``a*``, the resulting action is to
Expand Down Expand Up @@ -1388,9 +1388,9 @@ Feedback
========

Regular expressions are a complicated topic. Did this document help you
understand them? Were there parts that were unclear, or Problems you
understand them? Were there parts that were unclear, or problems you
encountered that weren't covered here? If so, please send suggestions for
improvements to the author.
improvements to the :ref:`issue tracker <using-the-tracker>`.

The most complete book on regular expressions is almost certainly Jeffrey
Friedl's Mastering Regular Expressions, published by O'Reilly. Unfortunately,
Expand Down
2 changes: 1 addition & 1 deletion Doc/reference/lexical_analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -560,7 +560,7 @@ start with a character in the "letter-like" set ``xid_start``,
and the remaining characters must be in the "letter- and digit-like" set
``xid_continue``.

These sets based on the *XID_Start* and *XID_Continue* sets as defined by the
These sets are based on the *XID_Start* and *XID_Continue* sets as defined by the
Unicode standard annex `UAX-31`_.
Python's ``xid_start`` additionally includes the underscore (``_``).
Note that Python does not necessarily conform to `UAX-31`_.
Expand Down
1 change: 0 additions & 1 deletion Include/cpython/pystats.h
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,6 @@ typedef struct _optimization_stats {
uint64_t unknown_callee;
uint64_t trace_immediately_deopts;
uint64_t executors_invalidated;
uint64_t fitness_terminated_traces;
UOpStats opcode[PYSTATS_MAX_UOP_ID + 1];
uint64_t unsupported_opcode[256];
uint64_t trace_length_hist[_Py_UOP_HIST_SIZE];
Expand Down
4 changes: 0 additions & 4 deletions Include/internal/pycore_interp_structs.h
Original file line number Diff line number Diff line change
Expand Up @@ -449,10 +449,6 @@ typedef struct _PyOptimizationConfig {
uint16_t side_exit_initial_value;
uint16_t side_exit_initial_backoff;

// Trace fitness thresholds
uint16_t fitness_initial;
uint16_t fitness_initial_side;

// Optimization flags
bool specialization_enabled;
bool uops_optimize_enabled;
Expand Down
21 changes: 2 additions & 19 deletions Include/internal/pycore_optimizer.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,23 +15,6 @@ extern "C" {
#include "pycore_optimizer_types.h"
#include <stdbool.h>

/* Default fitness configuration values for trace quality control.
* FITNESS_INITIAL and FITNESS_INITIAL_SIDE can be overridden via
* PYTHON_JIT_FITNESS_INITIAL and PYTHON_JIT_FITNESS_INITIAL_SIDE */
#define FITNESS_PER_INSTRUCTION 2
#define FITNESS_BRANCH_BASE 5
#define FITNESS_INITIAL (FITNESS_PER_INSTRUCTION * 1000)
#define FITNESS_INITIAL_SIDE (FITNESS_INITIAL / 2)
#define FITNESS_BACKWARD_EDGE (FITNESS_INITIAL / 10)

/* Exit quality constants for fitness-based trace termination.
* Higher values mean better places to stop the trace. */

#define EXIT_QUALITY_DEFAULT 200
#define EXIT_QUALITY_CLOSE_LOOP (4 * EXIT_QUALITY_DEFAULT)
#define EXIT_QUALITY_ENTER_EXECUTOR (2 * EXIT_QUALITY_DEFAULT + 100)
#define EXIT_QUALITY_SPECIALIZABLE (EXIT_QUALITY_DEFAULT / 4)


typedef struct _PyJitUopBuffer {
_PyUOpInstruction *start;
Expand Down Expand Up @@ -118,8 +101,7 @@ typedef struct _PyJitTracerPreviousState {
} _PyJitTracerPreviousState;

typedef struct _PyJitTracerTranslatorState {
int32_t fitness; // Current trace fitness, starts high, decrements
int frame_depth; // Current inline depth (0 = root frame)
int jump_backward_seen;
} _PyJitTracerTranslatorState;

typedef struct _PyJitTracerState {
Expand Down Expand Up @@ -412,6 +394,7 @@ extern JitOptRef _Py_uop_sym_new_type(
extern JitOptRef _Py_uop_sym_new_const(JitOptContext *ctx, PyObject *const_val);
extern JitOptRef _Py_uop_sym_new_const_steal(JitOptContext *ctx, PyObject *const_val);
bool _Py_uop_sym_is_safe_const(JitOptContext *ctx, JitOptRef sym);
bool _Py_uop_sym_is_not_container(JitOptRef sym);
_PyStackRef _Py_uop_sym_get_const_as_stackref(JitOptContext *ctx, JitOptRef sym);
extern JitOptRef _Py_uop_sym_new_null(JitOptContext *ctx);
extern bool _Py_uop_sym_has_type(JitOptRef sym);
Expand Down
20 changes: 19 additions & 1 deletion Lib/test/test_capi/test_opt.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@
# For frozendict JIT tests
FROZEN_DICT_CONST = frozendict(x=1, y=2)

# For frozenset JIT tests
FROZEN_SET_CONST = frozenset({1, 2, 3})

class _GenericKey:
pass

Expand Down Expand Up @@ -2169,7 +2172,8 @@ def f(n):
self.assertIsNotNone(ex)
uops = get_opnames(ex)
self.assertNotIn("_GUARD_TOS_ANY_SET", uops)
self.assertIn("_CONTAINS_OP_SET", uops)
# _CONTAINS_OP_SET is constant-folded away for frozenset literals
self.assertIn("_INSERT_2_LOAD_CONST_INLINE_BORROW", uops)

def test_remove_guard_for_known_type_tuple(self):
def f(n):
Expand Down Expand Up @@ -4399,6 +4403,20 @@ def testfunc(n):
# lookup result is folded to constant 1, so comparison is optimized away
self.assertNotIn("_COMPARE_OP_INT", uops)

def test_contains_op_frozenset_const_fold(self):
def testfunc(n):
x = 0
for _ in range(n):
if 1 in FROZEN_SET_CONST:
x += 1
return x

res, ex = self._run_with_optimizer(testfunc, TIER2_THRESHOLD)
self.assertEqual(res, TIER2_THRESHOLD)
self.assertIsNotNone(ex)
uops = get_opnames(ex)
self.assertNotIn("_CONTAINS_OP_SET", uops)

def test_binary_subscr_list_slice(self):
def testfunc(n):
x = 0
Expand Down
32 changes: 32 additions & 0 deletions Lib/test/test_zoneinfo/test_zoneinfo.py
Original file line number Diff line number Diff line change
Expand Up @@ -741,6 +741,38 @@ def test_empty_zone(self):
with self.assertRaises(ValueError):
self.klass.from_file(zf)

def test_invalid_transition_index(self):
STD = ZoneOffset("STD", ZERO)
DST = ZoneOffset("DST", ONE_H, ONE_H)

zf = self.construct_zone([
ZoneTransition(datetime(2026, 3, 1, 2), STD, DST),
ZoneTransition(datetime(2026, 11, 1, 2), DST, STD),
], after="", version=1)

data = bytearray(zf.read())
timecnt = struct.unpack_from(">l", data, 32)[0]
idx_offset = 44 + timecnt * 4
data[idx_offset + 1] = 2 # typecnt is 2, so index 2 is OOB
f = io.BytesIO(bytes(data))

with self.assertRaises(ValueError):
self.klass.from_file(f)

def test_transition_lookahead_out_of_bounds(self):
STD = ZoneOffset("STD", ZERO)
DST = ZoneOffset("DST", ONE_H, ONE_H)
EXT = ZoneOffset("EXT", ONE_H)

zf = self.construct_zone([
ZoneTransition(datetime(2026, 3, 1), STD, DST),
ZoneTransition(datetime(2026, 6, 1), DST, EXT),
ZoneTransition(datetime(2026, 9, 1), EXT, DST),
], after="")

zi = self.klass.from_file(zf)
self.assertIsNotNone(zi)

def test_zone_very_large_timestamp(self):
"""Test when a transition is in the far past or future.

Expand Down
Loading
Loading