- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.3k
Description
Bug report
On second thought this issue should be an enhancement instead of a bug report. Sorry for the wrong template.
Bug description:
There are some cases where _pyrepl auto-indentation works not well.
Cases
- A line ending with :in a multi-line string is wrongly indented.
Observed
>>> s = '''
... Note:
... ␣␣␣␣|Expected
>>> s = '''
... Note:
... |- #inside strings is seen as a comment, the following- :is ignored.
Observed
>>> if ' ' == '#':
... |Expected
>>> if ' ' == '#':
... ␣␣␣␣|- When the entire cursor line is a comment and is already indented, pressing Enter gives a further indent.
Observed
>>> def f():
...     # foo⤶
... ␣␣␣␣␣␣␣␣|Expected
>>> def f():
...     # foo⤶
... ␣␣␣␣|Possible solution
Currently _should_auto_indent() parses the buffer from right to left and stops at the first newline it encounters. Only the last line that is not a comment line of the buffer is parsed.
But by parsing from right to left we can't tell if a # starts a comment or is part of a string. For example if we, from right to left, encounter a " first and then a #, we don't know if the # is a comment. To know that, we need to know if the " is a string boundary, but the # might comment out the ", so we can't be sure. There is a information dependency cycle.
To fix this I made a change to parse the buffer from left to right, keeping track of whether current char is inside a string or a comment. This approach solves the above three cases. However the whole buffer is parsed on every call of _should_auto_indent(), with very long buffer, there might be noticeable delay when pressing Enter.
I think this is a big change. It affects how _should_auto_indent() works as a whole. I am hesitated to create a PR and just put it here first to hopefully get feedback.
CPython versions tested on:
3.15
Operating systems tested on:
Linux