RAG-powered biomedical literature analysis with contradiction detection and evidence synthesis.
MedTrace retrieves, analyzes, and synthesizes peer-reviewed biomedical literature from PubMed, automatically detecting conflicting evidence and providing quality metrics for AI-generated responses.
Connects to the PubMed database via NCBI E-utilities API to fetch peer-reviewed articles, chunks and embeds them into a FAISS vectorstore, then answers queries with automatically detected evidence contradictions and quality metrics.
Eliminates manual literature review burden while providing critical analysis of conflicting study findingsโsomething standard RAG systems ignore. Users get not just answers, but insight into the strength and consistency of the underlying evidence.
Unlike standard RAG systems that present retrieved text blindly, MedTrace:
- Detects contradictions across multiple studies and synthesizes conflicting evidence
- Scores response quality using LLM-as-judge evaluation (relevance, groundedness, confidence)
- Surfaces evidence distribution (Supporting/Opposing/Neutral) transparently
| Feature | Description |
|---|---|
| ๐ PubMed Integration | Direct API access to 39M+ peer-reviewed biomedical articles |
| โก Contradiction Detection | LLM-based stance classification identifies conflicting study conclusions |
| ๐ Quality Metrics | Real-time evaluation dashboard (Relevance, Groundedness, Confidence) |
| ๐ง Evidence Synthesis | Automatic summarization of opposing viewpoints when contradictions detected |
| ๐ Source Transparency | Every claim linked to PubMed source with stance classification |
MedTrace/
โโ artifacts/
โ โโ screenshots/ # UI screenshots and demo images
โ โโ *.txt # Documentation and reference files
โ
โโ core/ # Backend processing modules
โ โโ __init__.py # Package exports
โ โโ config.py # Environment configuration and path resolution
โ โโ pubmed_fetcher.py # NCBI E-utilities API integration for article retrieval
โ โโ chunker.py # Recursive character text splitting with overlap
โ โโ embeddings.py # SentenceTransformer wrapper with fallback mechanisms
โ โโ vector_store.py # FAISS index creation and persistence
โ โโ query_engine.py # Retrieval and LLM response generation
โ โโ contradiction_detector.py # Stance analysis and conflict detection
โ โโ evaluation.py # LLM-as-judge metrics (relevance, groundedness)
โ
โโ frontend/
โ โโ app.py # Streamlit UI with real-time metrics dashboard
โ
โโ .env # Environment variables (API keys, paths) - not committed
โโ .gitignore # Git exclusion rules
โโ requirements.txt # Python dependencies
โโ README.MD # This file
โ ๏ธ Prerequisites: Create a .env file in the root directory with:
GROQ_API_KEYVECTOR_DIRGROQ_MODELEMBEDDING_MODEL
git clone https://github.com/inv-fourier-transform/med-trace.git
cd MedTracepython -m venv .venvWindows:
.venv\Scripts\activatemacOS/Linux:
source .venv/bin/activatepip install -r requirements.txtCreate a .env file in the project root and add your required API keys.
streamlit run frontend/app.pyWorkflow:
- Enter a biomedical topic (e.g.,
"ketogenic diet") - Select number of articles to ingest (10โ299)
- Click "Ingest Articles" to fetch, chunk, and embed
- Ask questions and view contradiction analysis with quality metrics
python main.pyStandard RAG systems retrieve documents without analyzing agreement. MedTrace adds LLM-based stance analysis:
- Classifies stance: Each study labeled as Supporting, Opposing, or Neutral
- Detects conflicts: Identifies when evidence contradicts (e.g., 3 studies show benefits, 2 show no effect)
- Synthesizes conflicts: Generates balanced explanation of contradictions (study design differences, populations, protocols)
Evidence Distribution: โ
3 Supporting | โ 2 Opposing | โ๏ธ 1 Neutral
โ ๏ธ Contradictory Evidence Detected
Evidence Synthesis:
While three RCTs demonstrate cardiovascular benefits of intermittent fasting,
two recent meta-analyses found no significant effect when controlling for
caloric deficit. Differences stem from intervention duration protocols.
The demo video can be viewed by downloading it. It's just 14 MB in size!
FYI, Miss K, a metaphor for someone๐ค, is endorsing my product!
Every response includes LLM-as-judge quality metrics:
| Metric | Description |
|---|---|
| Relevance | Retrieved articles match the query specificity |
| Groundedness | Answer is factually supported by sources |
| Confidence | Citation density and specificity |
| Overall | Aggregate quality score |
Indicators:
๐ข High (โฅ80%) | ๐ก Medium (60โ79%) | ๐ด Low (<60%)
- Python 3.10+ โ Core language
- LangChain โ RAG orchestration
- FAISS โ Vector similarity search
- Sentence-Transformers โ BGE embeddings
- Groq โ High-speed LLM inference
- Streamlit โ Web interface
- PubMed E-utilities API โ Literature source
- Multi-query Retrieval โ Decompose complex questions into sub-queries
- Citation Network Analysis โ Map relationships between studies
- Export Functionality โ Generate PDF reports with citations
- NLM/NCBI for PubMed database and E-utilities API
- BAAI for BGE embedding models
- LangChain and Streamlit communities
Content sourced from PubMed/NLM. For informational purposes only โ not for diagnosis or treatment decisions. Always consult healthcare providers for medical advice. Evidence metrics are algorithmic estimates and should not replace reading primary sources.
- Specific queries work best โ Ask precise clinical questions
- Check evidence distribution โ Review Supporting/Opposing badges
- Monitor metrics โ Low groundedness indicates weak source support
MedTrace: Where AI meets evidence-based medicine.









