pip install mindsdb_sql_parser
from mindsdb_sql_parser import parse_sql
query = parse_sql('select b from aaa where c=1')
# result is abstract syntax tree (AST) 
query
# string representation of AST
query.to_tree()
# representation of tree as sql string. it can not exactly match with original sql
query.to_string()For parsing is used SLY library.
Parsing consists of 2 stages, (separate module for every dialect):
- Defining keywords in lexer.py module. It is made mostly with regexp
 - Defining syntax rules in parser.py module. It is made by describing rules in BNF grammar
- Syntax is defined in decorator of function. Inside of decorator you can use keyword itself or other function from parser
 - Output of function can be used as input in other functions of parser
 - Outputs of the parser is listed in "Top-level statements". It has to be Abstract syntax tree (AST) object.
 
 
SLY does not support inheritance, therefore every dialect is described completely, without extension one from another.
- Structure of AST is defined in separate modules (in parser/ast/).
 - It can be inherited
 - Every class have to have these methods:
- to_tree - to return hierarchical representation of object
 - get_string - to return object as sql expression (or sub-expression)
 - copy - to copy AST-tree to new object
 
 
For better user experience parsing error contains useful information about problem location and possible solution to solve it.
- it shows location of error if
 
- character isn't parsed (by lexer)
 - token is unexpected (by parser)
 
- it tries to propose correct token instead (or before) error location. Possible options
 
- Keyword will be showed as is.
 - '[number]' - if float and integer is expected
 - '[string]' - if string is expected
 - '[identifier]' - if name of the objects is expected. For example, they are bold words here:
- "select x as name from tbl1 where col=1"
 
 
How suggestion works: It uses next possible tokens defined by syntax rules. If this is the end of the query: just shows these tokens. Else:
- it tries to replace bad token with other token from list of possible tokens
 - tries to parse query once again, if there is no error:
- add this token to suggestion list
 
 - second iteration: put possible token before bad token (instead of replacement) and repeat the same operation.
 
pip install -r requirements_test.txt
env PYTHONPATH=./ pytest