StaMESP

A State Machine Emulator for String Processing.

GitHub: https://github.com/Jerem-dY/StaMESP
Crates.io: https://crates.io/crates/stamesp
Doc: https://docs.rs/matx/latest/stamesp/index.html

TODO

Scanner
Parser
Interpreter
Output tree-like structures (like XML)

The 4 Constituents

The language itself relies on 4 elements:

SETS, that are a list of litterals which can constitude a shared premise to a transition function.
sets are made of LITTERALS, which are strings specified between inverted commas <"> or single quotes <'>.
STATES describe both the string output of a transition (the name of the state) and the different transitions available from that point on.
EXPRESSIONS are found in states and link a premise (a set) to a conclusion (a state) in a particular state.

Nodes

The system relies on a tree system, with each new state being a node that has the token(s) that lead to it as children. Such nodes can then be exported to XML.

Syntax

Basics

SETS and LITTERALS are described as follows:

setname('this' 'is' 'a' 'test')

While commas or other separators are technically viable to separate the different litterals, it is recommended to go with the space character. A comma or a line break are required after the closing of parentheses to end the set.

STATES and EXPRESSIONS must be written this way:

state_one{
    setname = state_two
    another_set = state_three
}

Start and End

To be usable, a system described by those rules must have at most (and at least) one ENTRY POINT and at least one END POINT. An asterix <*> and/or a colon <:> can be put after the state's identifier to make it an entry or end point, respectively.

The ENTRY POINT is the state which the simulation will start in, and END POINTS are the states which the system can stop within without throwing an error. This allows for recognition (automata), as the error thrown can be caught and treated accordingly. Entry point can be chosen when starting the interpreter, as long as it exists in the system.

Modifiers

States can have several MODIFIERS in expressions.

a hat <^> means we do not want to write a new node, only changing the state internally. Beware, as it can lead to unforeseen behavior!
a percentage <%> means we do want to write, but before we get to the new state. This is particularly useful for suffixes, as they should both be added to the current token and signal the end of it.

Special elements

<@> is a special set that describes any token that does not belong to the different sets specified in the expressions. It is local, meaning that it does not have the same value across states: it is a way to describe potentially infinite tokens, thus acting like a default in a switch-case statement.
<.> is a special state that signals we want to stay in the current state, without creating a new node. This allows for concatenation. If one wants to create a new node while staying in the same state, all they have to do is write the state's name.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
test.txt		test.txt
test2.txt		test2.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StaMESP

TODO

The 4 Constituents

Nodes

Syntax

Basics

Start and End

Modifiers

Special elements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StaMESP

TODO

The 4 Constituents

Nodes

Syntax

Basics

Start and End

Modifiers

Special elements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages