Skip to content

Labic-ICMC-USP/EventPredecessor

Repository files navigation

EventPredecessor

EventPredecessor is a small but ambitious pipeline to discover precursor events from news. It combines:

  1. News collection in temporal windows with GNews;
  2. 5W1H event extraction powered by an LLM (JSON-constrained with Pydantic);
  3. Semantic graph construction, where edges represent likely precursor relations between events, based on sentence embeddings and temporal order;
  4. Optional visualization on a world map using yfiles_jupyter_graphs.

The goal is to explore chains of events like "protests → repression → arrests → political crisis", starting from a user-defined reference window and walking backwards in time to search for possible precursors.


1. Installation

From a local clone of this project:

pip install .

This will install the package eventpredecessor and the console script eventpredecessor.

Important: the package expects you to have an API key for an LLM provider. By default, it is configured for OpenRouter (ChatOpenAI via langchain-openai), but you can adapt it to any compatible endpoint.


2. High-level architecture

The pipeline is composed of the following main components:

  • PrecursorNewsCollector (news.py)

    • Receives a reference time window and a step (d, w, m, y).
    • Iteratively builds earlier windows (precursors) and queries GNews.
    • Normalizes results into Article objects.
  • EventLLMExtractor (llm.py)

    • Wraps LLMExecutor and converts each news article into a 5W1H event (EventSchema).
    • Uses a category dictionary {label: description} provided in the YAML config to guide the LLM when choosing the event category.
  • PrecursorEventGraphBuilder (graph_builder.py)

    • Receives (article, event) pairs and builds a directed graph.
    • First phase: for each event, chooses at most one best parent whose similarity is above mean + std of all candidate pairs.
    • Second phase: connects disconnected components while keeping at most one parent per node.
  • graph_io.py and plot.py

    • Utilities to export/import the full graph as JSON.
    • EventGraphPlotter allows you to inspect the graph in Jupyter, either with a data sidebar or on top of a world map (using lat/long).
  • EventPredecessorPipeline (runner.py)

    • Reads a YAML configuration file.
    • Orchestrates news collection, LLM extraction, and graph construction.
    • Saves the final graph to a JSON file.

All parameters (keywords, time windows, LLM model, categories, thresholds, etc.) are stored in a single YAML file to make experiments fully reproducible.


3. Configuration via YAML

Below is an example config.yaml:

news:
  keywords:
    - "paciente morre à espera de atendimento"
    - "paciente morre na fila de atendimento"
    - "fila de atendimento no SUS"
    - "hospital público lotado"
    - "UPA lotada"
    - "pronto-socorro lotado"
    - "falta de leito no hospital"
    - "falta de médico em posto de saúde"
    - "falta de remédio em posto de saúde"
    - "unidade básica de saúde sem médico"
    - "superlotação em hospital público"
    - "falta de ambulância"
    - "demora para atendimento médico"
    - "demora para realização de exame"
  reference_start: "2025-10-23"
  reference_end: "2025-11-23"
  window: "m"              # d (day), w (week), m (month), y (year)
  max_iterations: 2 # increase for more extensive searches
  stop_when_no_articles: false
  searcher:
    language: "pt"
    country: "BR"
    max_results: 100

categories:
  "access_delay": "Casos em que o foco é a demora no atendimento, como longas filas, espera excessiva por consultas, exames ou cirurgias, resultando ou não em agravamento do quadro."
  "resource_shortage": "Problemas de falta de recursos humanos ou materiais, como ausência de médicos, enfermeiros, leitos, ambulâncias, medicamentos ou equipamentos."
  "infrastructure_failure": "Situações de precariedade física ou estrutural, como hospitais superlotados, pacientes em macas nos corredores, falta de manutenção, risco sanitário, interdição de unidades, etc."
  "management_governance": "Casos em que o problema central é gestão, planejamento ou governança do sistema de saúde: má alocação de recursos, decisões administrativas, fechamento de unidades, cortes de orçamento, etc."
  "health_worker_conditions": "Notícias que enfatizam as condições de trabalho dos profissionais de saúde: sobrecarga, burnout, falta de proteção, greves, jornadas exaustivas."
  "violence_against_health_workers": "Agressões físicas, verbais ou ameaças contra profissionais de saúde em serviços públicos ou conveniados ao SUS."
  "corruption_irregularities": "Desvios de recursos, fraudes em contratos, superfaturamento, investigações e escândalos ligados à gestão da saúde pública."
  "digital_health_issues": "Problemas ligados a prontuário eletrônico, sistemas de regulação, agendamento online ou telemedicina que impactem diretamente o acesso ou a qualidade do atendimento."

llm:
  api_key: "API_KEY" # use your own API key ou a local LLM
  model_name: "mistralai/mistral-nemo"
  base_url: "https://openrouter.ai/api/v1"
  temperature: 0.0
  max_workers: 20

builder:
  embedding_model_name: "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
  same_category_only: true
  connect_components: true

output:
  full_graph_json_path: "events_full_graph.json"

Note: The categories section is a dictionary. The keys are labels that will appear in the EventSchema.category field, and the values are short descriptions injected in the system prompt. This helps the LLM to understand the intended meaning of each label.


4. Running the pipeline (CLI)

Once installed, create your config.yaml and simply run:

eventpredecessor -c config.yaml

This will:

  1. Collect news for the reference window and precursor windows;
  2. Extract 5W1H events for each article using the configured LLM;
  3. Build a precursor graph with semantic and temporal constraints;
  4. Save the final graph to output.full_graph_json_path (in the example: events_full_graph.json).

All steps log structured JSON to stdout using Python's logging with a custom JSON formatter.


5. Using the pipeline from Python

from eventpredecessor import EventPredecessorPipeline

pipeline = EventPredecessorPipeline.from_yaml("config.yaml")
result = pipeline.run()

G = result["graph"]
stats = result["stats"]
print(stats)

You can then export or visualize the graph as you prefer. The package already offers JSON helpers and a yFiles-based plotter:

from eventpredecessor.graph_io import save_full_graph_json
from eventpredecessor.plot import EventGraphPlotter

save_full_graph_json(G, "events_full_graph.json")
plotter = EventGraphPlotter("events_full_graph.json")
w_sidebar = plotter.plot_with_sidebar()
w_map = plotter.plot_on_map(use_heat_mapping=True)

The tutorial.ipynb included in this repository shows a small end-to-end example.


6. Why this is interesting?

  • It transforms raw news streams into event graphs that can be explored as chains of causality, escalation or diffusion.
  • It is LLM-agnostic: any provider compatible with ChatOpenAI can be used (OpenAI, OpenRouter, local gateways, etc.).
  • All hyperparameters and semantic choices (like event categories) live in a YAML file, which makes it a nice playground for students and researchers interested in event mining, graph analysis, and precursor detection.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published