Monk-RET (Monk Retail Insights Engine) is an intelligent retail analytics platform powered by MonkDB, MonkDB's MCP LangChain, Streamlit, and modern AI/ML pipelines.
It helps businesses gain actionable insights from large-scale retail data by orchestrating data ingestion, processing, and visualization seamlessly.
- 📊 Retail Analytics Engine – Ingests and processes large-scale retail datasets into MonkDB.
- 🧩 LangChain Orchestrator – Modular orchestration of tasks with LLMs
- ⚡ Batch Data Processing – Automated CSV ingestion & database syncing
- 📈 Interactive Dashboards – Streamlit-based UI for analytics & insights
- 🔄 Automation – Watchdog-powered auto-refresh for new datasets
git clone https://github.com/monkdbofficial/demo.retailagent.git
cd demo.retailagentpip install -r requirements.txtpython3 watchdog_.pyThis shall trigger the watch and downstream agent orchestration logic which do the following in phases:
- Chunk the data to 5000 records, and leverage dask to process the records before publishing to MonkDB tables.
- Generate a streamlit dashboard app with insights based on MonkDB's sql queries.
- The streamlit dashboard is deployed to its destination that has the dashboard, charts, and metrics in those charts.
As highlighted in the data flow diagram, watchdog triggers agents execution.
- Upload agent- It uploads the processed data to MonkDB.
- Generate Insights agent- It generates the insights by querying the database using MonkDB's SQL via MCP in agent's tool interface. It is used to generate the dashboard pack.
- Deploy agent- This agent deploys the pack to destination.
- Languages: Python
- Database: MonkDB and its python sdk
- Frameworks: MonkDB's MCP, LangChain
- Data: Dask
- DevOps: Watchdog
- Visualization: Plotly, Streamlit
-
Install and provision MonkDB as per its documentation.
-
Provision MonkDB's user using PSQL as highlighted in MonkDB's documentation.
-
In this repo, update config.ini file located in config folder. Please ensure the IP address of
DB_HOSTvariable is updated. It denotes the instance where MonkDB is installed. -
Provision an LLM model. We are using Mistral via Ollama (
ollama run mistral).
[database]
DB_HOST = xx.xx.xx.xxx
DB_PORT = 4200
DB_USER = testuser
DB_PASSWORD = testpassword
DB_SCHEMA = trent
TABLE_NAME = products
- Also, ensure
.envis updated in the root of this repo with the correct IP address of MonkDB's host.
MONKDB_HOST=xx.xx.xx.xxx
MONKDB_PORT=4200
MONKDB_USER=testuser
MONKDB_PASSWORD=testpassword
MONKDB_SCHEMA=trent
MONKDB_API_PORT=4200
# Optional OTEL configuration which can be enabled or disabled.
MONKDB_OTEL_ENABLED=false
- As highlighted before, please create a virtual env and activate it before install requirements using pip.
- Drop a new retail CSV into the
/csv_folderfolder watchdog_.pydetects and inserts data → DBlangchain_orch.py&gen_insights_force.pygenerate AI-powered insights- Open
streamlit_app.py→ interactive analytics dashboard
Run this below command to execute performance testing.
python3 monkdb_pipeline_testrunner.py --csv datasets/_sample_products.csv --table trent.products --where "1=1" --parity-sample 200 --perf-repeats 20 --out-json reports/report.json --out-md reports/report.md| Argument | Purpose | Example Value |
|---|---|---|
| --csv | Path to the source CSV file used as the “gold” reference for accuracy checks. The script computes KPIs (row counts, averages, discount bands, etc.) on this file and compares them to MonkDB query results. | datasets/_sample_products.csv |
| --table | Fully-qualified database table name to query inside MonkDB. The test runner runs SELECTs on this table and compares them to the CSV-derived KPIs. | trent.products |
| --where | Optional SQL WHERE clause applied to every database query. Lets you scope tests to a subset of rows (e.g., a specific brand or time range). "1=1" is the neutral default (no filtering). | "1=1" |
| --parity-sample | Number of rows to sample for row-parity testing. The script randomly picks up to this many primary-key combinations from the CSV and checks if they exist in the DB. Skipped if no primary key. | 200 |
| --perf-repeats | Number of times to repeat each key query (KPIs, discount bands, brand share) for latency measurement. The script records P50, P95, and P99 response times across these runs. | 20 |
| --out-json | Output path for the machine-readable report (JSON). Contains accuracy metrics, latency percentiles, and row-parity results. | reports/report.json |
| --out-md | Output path for the human-readable Markdown report. Summarises key accuracy and performance findings for easy sharing or inclusion in client documentation. | reports/report.md |
This will execute our pipeline testrunner test script, and generate reports in reports folder.
We have executed performance tests in the below instance (digital ocean)
- OS: Ubuntu 25.04 x64
- vCPUs: 4 vCPUs
- RAM/SSD: 8GB / 240GB Disk
- Family: General compute
Due to cost considerations, testing was conducted on a modest DigitalOcean droplet (4 vCPU / 8 GB RAM / 240 GB SSD). Consequently, the KPI, discount-band, and brand-share queries measured around 0.8–1.1 s P95.
In production we recommend AWS m6in (or equivalent) instances. These are powered by 3rd-Gen Intel Xeon Scalable (“Ice Lake”) CPUs up to 3.5 GHz, with 200 Gbps networking and 80–100 Gbps EBS throughput, and scale to 128 vCPUs / 512 GiB RAM. Our enterprise MonkDB customers running similar analytics consistently achieve sub-300 ms P95 latencies on such hardware.
This means the latencies observed on DigitalOcean should be viewed as conservative; significantly lower numbers are expected on production-grade instances.
- Public Data Source: The dataset (Myntra Product Listings) is released under a CC0 Public Domain license and contains only publicly available product catalogue information—no personal or sensitive consumer data.
- Secure Credentials: Database and API credentials are stored in
.envandconfig.ini, never hard-coded or exposed in logs or the repository. This can secured even further by bringing in concepts like Hashicorp's Vault or such alternatives in production environment. - Read-only Analytics: All analytics operations use MonkDB's MCP SELECT-only interface; no user-identifiable or private data is written or modified.
- Logging Hygiene: Application logs exclude secrets and comply with data minimisation best practices.
You may swap
- langchain with another agentic framework.
- Streamlit with another frontend framework.
- Mistral model with another LLM model which is a pre-requisite for agentic framework.
This repo is licensed under permissive Apache 2.0 license.

