Skip to content

String Unicode Fragment Issues #44

@bign8

Description

@bign8

👋 Hey folks! I saw the NUM project Reddit and thought it was a great idea. Diving in I found MODL, which made me even more excited, but noticed it didn't have a ton of libraries yet, so I started hacking on one just to see if I could get something working. It's still very much a work in progress, and just something I'm hacking on in my free time, so no promises on quality.

https://github.com/bign8/modl.go

Anyway, I ran into an issue with my unicode parsing logic. Based on the test added in d066849, it appears MODL is supporting non-4 digit unicode characters which doesn't seem to match with the grammar defined below or the written specification: https://www.modl.uk/specification#hex-values.

fragment UNICODE
: 'u' HEX HEX HEX HEX
;
fragment HEX
: [0-9a-fA-F]
;

But, the Java library looks to support this behavior, which is great, I just didn't notice it really documented anywhere besides the test case and in the java source.

https://github.com/MODLanguage/java-interpreter/blob/d9cc9d76f73687a03114d57fccc253c3c82fad71/src/main/java/uk/modl/utils/UnicodeEscapeReplacer.java#L104-L174

Given the complexity of the UnicodeEscapeReplacer, I'm not really sure the best way to represent those nuances in the grammar effectively. But having a note somewhere that non-4 digit code points are supported would be dope. Anyway, let me know what you think and I can get something in a PR for ya.

Cheers 🍻

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions