Skip to content

Should JsonReader ignore BOM char at start of inputStream? #43

@lmsurpre

Description

@lmsurpre

I'm processing some JSON files that I got from an external source. Everything was going well until I hit this on one of them:

Caused by: jakarta.json.stream.JsonParsingException: Unexpected char 65,279 at (line no=1, column no=1, offset=0)
    at org.eclipse.parsson.JsonTokenizer.unexpectedChar (JsonTokenizer.java:584)
    at org.eclipse.parsson.JsonTokenizer.nextToken (JsonTokenizer.java:396)
    at org.eclipse.parsson.JsonParserImpl$NoneContext.getNextEvent (JsonParserImpl.java:425)
    at org.eclipse.parsson.JsonParserImpl.next (JsonParserImpl.java:375)
    at org.eclipse.parsson.JsonReaderImpl.readObject (JsonReaderImpl.java:99)

It looks like the file has a BOM char at its start.
I'm pretty sure (but not positive) that isn't allowed, but I'm wondering if the Parsson authors are interested in making the parser resilient to this situation.

From https://datatracker.ietf.org/doc/html/rfc8259#section-8.1

Implementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a networked-transmitted JSON text. In the interests of
interoperability, implementations that parse JSON texts MAY ignore
the presence of a byte order mark rather than treating it as an
error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions