EventPredecessor is a small but ambitious pipeline to discover precursor
events from news. It combines:
- News collection in temporal windows with
GNews; - 5W1H event extraction powered by an LLM (JSON-constrained with Pydantic);
- Semantic graph construction, where edges represent likely precursor relations between events, based on sentence embeddings and temporal order;
- Optional visualization on a world map using
yfiles_jupyter_graphs.
The goal is to explore chains of events like "protests → repression → arrests → political crisis", starting from a user-defined reference window and walking backwards in time to search for possible precursors.
From a local clone of this project:
pip install .This will install the package eventpredecessor and the console script
eventpredecessor.
Important: the package expects you to have an API key for an LLM provider. By default, it is configured for OpenRouter (
ChatOpenAIvialangchain-openai), but you can adapt it to any compatible endpoint.
The pipeline is composed of the following main components:
-
PrecursorNewsCollector(news.py)- Receives a reference time window and a step (
d,w,m,y). - Iteratively builds earlier windows (precursors) and queries
GNews. - Normalizes results into
Articleobjects.
- Receives a reference time window and a step (
-
EventLLMExtractor(llm.py)- Wraps
LLMExecutorand converts each news article into a 5W1H event (EventSchema). - Uses a category dictionary
{label: description}provided in the YAML config to guide the LLM when choosing the event category.
- Wraps
-
PrecursorEventGraphBuilder(graph_builder.py)- Receives
(article, event)pairs and builds a directed graph. - First phase: for each event, chooses at most one best parent whose
similarity is above
mean + stdof all candidate pairs. - Second phase: connects disconnected components while keeping at most one parent per node.
- Receives
-
graph_io.pyandplot.py- Utilities to export/import the full graph as JSON.
EventGraphPlotterallows you to inspect the graph in Jupyter, either with a data sidebar or on top of a world map (using lat/long).
-
EventPredecessorPipeline(runner.py)- Reads a YAML configuration file.
- Orchestrates news collection, LLM extraction, and graph construction.
- Saves the final graph to a JSON file.
All parameters (keywords, time windows, LLM model, categories, thresholds, etc.) are stored in a single YAML file to make experiments fully reproducible.
Below is an example config.yaml:
news:
keywords:
- "paciente morre à espera de atendimento"
- "paciente morre na fila de atendimento"
- "fila de atendimento no SUS"
- "hospital público lotado"
- "UPA lotada"
- "pronto-socorro lotado"
- "falta de leito no hospital"
- "falta de médico em posto de saúde"
- "falta de remédio em posto de saúde"
- "unidade básica de saúde sem médico"
- "superlotação em hospital público"
- "falta de ambulância"
- "demora para atendimento médico"
- "demora para realização de exame"
reference_start: "2025-10-23"
reference_end: "2025-11-23"
window: "m" # d (day), w (week), m (month), y (year)
max_iterations: 2 # increase for more extensive searches
stop_when_no_articles: false
searcher:
language: "pt"
country: "BR"
max_results: 100
categories:
"access_delay": "Casos em que o foco é a demora no atendimento, como longas filas, espera excessiva por consultas, exames ou cirurgias, resultando ou não em agravamento do quadro."
"resource_shortage": "Problemas de falta de recursos humanos ou materiais, como ausência de médicos, enfermeiros, leitos, ambulâncias, medicamentos ou equipamentos."
"infrastructure_failure": "Situações de precariedade física ou estrutural, como hospitais superlotados, pacientes em macas nos corredores, falta de manutenção, risco sanitário, interdição de unidades, etc."
"management_governance": "Casos em que o problema central é gestão, planejamento ou governança do sistema de saúde: má alocação de recursos, decisões administrativas, fechamento de unidades, cortes de orçamento, etc."
"health_worker_conditions": "Notícias que enfatizam as condições de trabalho dos profissionais de saúde: sobrecarga, burnout, falta de proteção, greves, jornadas exaustivas."
"violence_against_health_workers": "Agressões físicas, verbais ou ameaças contra profissionais de saúde em serviços públicos ou conveniados ao SUS."
"corruption_irregularities": "Desvios de recursos, fraudes em contratos, superfaturamento, investigações e escândalos ligados à gestão da saúde pública."
"digital_health_issues": "Problemas ligados a prontuário eletrônico, sistemas de regulação, agendamento online ou telemedicina que impactem diretamente o acesso ou a qualidade do atendimento."
llm:
api_key: "API_KEY" # use your own API key ou a local LLM
model_name: "mistralai/mistral-nemo"
base_url: "https://openrouter.ai/api/v1"
temperature: 0.0
max_workers: 20
builder:
embedding_model_name: "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
same_category_only: true
connect_components: true
output:
full_graph_json_path: "events_full_graph.json"Note: The
categoriessection is a dictionary. The keys are labels that will appear in theEventSchema.categoryfield, and the values are short descriptions injected in the system prompt. This helps the LLM to understand the intended meaning of each label.
Once installed, create your config.yaml and simply run:
eventpredecessor -c config.yamlThis will:
- Collect news for the reference window and precursor windows;
- Extract 5W1H events for each article using the configured LLM;
- Build a precursor graph with semantic and temporal constraints;
- Save the final graph to
output.full_graph_json_path(in the example:events_full_graph.json).
All steps log structured JSON to stdout using Python's logging with a
custom JSON formatter.
from eventpredecessor import EventPredecessorPipeline
pipeline = EventPredecessorPipeline.from_yaml("config.yaml")
result = pipeline.run()
G = result["graph"]
stats = result["stats"]
print(stats)You can then export or visualize the graph as you prefer. The package already offers JSON helpers and a yFiles-based plotter:
from eventpredecessor.graph_io import save_full_graph_json
from eventpredecessor.plot import EventGraphPlotter
save_full_graph_json(G, "events_full_graph.json")
plotter = EventGraphPlotter("events_full_graph.json")
w_sidebar = plotter.plot_with_sidebar()
w_map = plotter.plot_on_map(use_heat_mapping=True)The tutorial.ipynb included in this repository shows a small end-to-end
example.
- It transforms raw news streams into event graphs that can be explored as chains of causality, escalation or diffusion.
- It is LLM-agnostic: any provider compatible with
ChatOpenAIcan be used (OpenAI, OpenRouter, local gateways, etc.). - All hyperparameters and semantic choices (like event categories) live in a YAML file, which makes it a nice playground for students and researchers interested in event mining, graph analysis, and precursor detection.