Skip to content

[BUG] join queries fail when combining metadata filters and odinson queries with exact range quantifiers #394

@myedibleenso

Description

@myedibleenso

When used in conjunction with a metadata filter (parent query), a pattern (Odinson query) with an exact range quantifier (>=2) will fail to match.

import ai.lum.odinson.test.utils.OdinsonTest

val ot = new OdinsonTest()

// metadata":[{"$type":"ai.lum.odinson.TokensField","name":"doc_id","tokens":["step-bros"]}]
// "John", "C.", "Reilly", "played", ...
val ee = ot.mkExtractorEngine("step-bros")

val pattern = "[lemma=play] >nsubj [tag=NNP]{,3}"
val mf = "doc_id == 'step-bros'"
val oq = ee.mkFilteredQuery(query = pattern, metadataFilter = mf)

val res = ee.query(oq)

// both of these unexpectedly fail
numMatches(res) == 1
existsMatchWithSpan(odinResults = res, doc = 0, start = 0, end = 3) == 1

Associated unit tests (ghp/join-query-bug branch)

  • it should "match events referencing existing mentions when an in-memory state is used along with a metadataFilter" in {
    val ee = extractorEngineWithSpecificState(getDocument("step-bros"), "memory")
    val grammar = """
    metadataFilters: doc_id == 'step-bros'
    rules:
    - name: person-rule
    type: basic
    priority: 1
    label: Person
    pattern: |
    [tag=NNP]{3}
    - name: state-based-rule
    type: event
    label: EventBasedPersonRule
    priority: 2
    pattern: |
    trigger = [lemma=play]
    # NOTE: ending with @Person won't work
    arg: Person = >nsubj
    """
    val extractors = ee.compileRuleString(grammar).toVector
    val mentions = ee.extractMentions(
    extractors = extractors,
    allowTriggerOverlaps = false,
    disableMatchSelector = false
    ).toVector
    getMentionsWithLabel(mentions, "EventBasedPersonRule") should have size (1)
    }
  • it should "match simple patterns with exact range quantifiers (>=2) and a metadata filter" in {
    // create in-memory engine w/ a single doc w/ a single sentence
    val ee = mkExtractorEngine("step-bros")
    // "John", "C.", "Reilly", "played" ...
    val pattern = "[lemma=play] >nsubj [tag=NNP]{3}"
    val mf = "doc_id == 'step-bros'"
    val oq = ee.mkFilteredQuery(query = pattern, metadataFilter = mf)
    val res = ee.query(oq)
    numMatches(res) should be (1)
    existsMatchWithSpan(odinResults = res, doc = 0, start = 0, end = 3) should be (true)
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions