Skip to content

Commit 1bad02a

Browse files
authored
feat: Add Analytics Engine MCP example (#82)
* feat: Add complete Analytics MCP example with Grafana integration Complete Analytics MCP server implementation with real GitHub data collection: 🎯 Core Features: - Model Context Protocol (MCP) server for analytics data - Real GitHub API data collection with dynamic date calculation - Cloudflare Analytics Engine integration for time-series storage - Grafana dashboard integration with HTTP endpoints and CORS - Batch processing support (10-record Analytics Engine limits) 🔧 Key Components: - generate-batch-data.js: Real GitHub data collection script - src/: Complete MCP server implementation (7 TypeScript files) - test/: Integration tests with Analytics Engine compatibility - Grafana endpoints: /grafana/query and /grafana/health 📊 Real Data Support: - GitHub API integration for actual repository statistics - Dynamic 30-day historical data generation - Multi-repository dashboard support - Production-ready error handling and fallbacks 📚 Documentation: - Comprehensive setup guide with MCP Inspector testing - Step-by-step Grafana dashboard configuration - Analytics Engine SQL compatibility guide - Troubleshooting section with real examples ✅ All tests passing (14/14) with Analytics Engine compatibility Files included (14 essential files): - README.md, package.json, wrangler.jsonc, tsconfig.json - generate-batch-data.js (real GitHub data script) - src/*.ts (7 files: complete MCP server implementation) - test/*.ts + test/tsconfig.json (tests + TypeScript config) - pnpm-lock.yaml (dependencies) * feat(analytics-mcp): Add vitest config for Cloudflare Workers testing Added vitest.config.ts to enable proper Cloudflare Workers test environment: - Enables cloudflare:test imports for test utilities - Configures Workers runtime for integration tests - All 14 tests now passing ✅ Complete Analytics MCP system now includes 15 essential files: - 7 src/*.ts files (MCP server implementation) - 2 test/* files (tests + configs) - 6 config/doc files (deployment, compilation, documentation) * fix(analytics-mcp): Update pnpm lockfile to match package.json dependencies Fixed pnpm lockfile sync issue: - Updated pnpm-lock.yaml to match analytics-mcp package.json - Resolved dependency specification mismatches - All workspace dependencies now properly resolved Dependencies now correctly locked: - @modelcontextprotocol/sdk (catalog) - @nullshot/mcp (workspace) - hono ^4.7.6 (for CORS support) - All test dependencies properly resolved * feat: enhance analyze_trends tool with column parameter and fix API token consistency - Add optional 'column' parameter to analyze_trends tool (defaults to double1) - Users can now specify which column (double1, double2, double3, etc.) to analyze - Fix logic bug where specifying a column caused 'Insufficient data' error - Standardize on CLOUDFLARE_API_TOKEN throughout codebase (repository.ts, schema.ts) - Update README.md to use CLOUDFLARE_API_TOKEN consistently - Improve UX with better error handling and user control Tested: Column parameter works correctly for both auto-detection and user-specified columns * fix: restore repository.ts with CLOUDFLARE_API_TOKEN and column parameter support * docs: update analyze_trends documentation and remove unused tools from README - Update analyze_trends to reflect single metric analysis (not array) - Add column parameter documentation - Remove algorithm parameter (unused) - Remove detect_anomalies and track_agent_metrics from tool lists - Clarify that column parameter auto-detects best column if not specified * refactor: remove unused algorithm parameter and clean up analyze_trends - Remove unused algorithm parameter from analyze_trends tool - Update README documentation to reflect single metric analysis - Remove detect_anomalies and track_agent_metrics from README - Keep tools in code for backward compatibility but remove from documentation Working features: - analyze_trends with column parameter ✅ - Grafana dashboard with updated API token ✅ - MCP Inspector testing ✅ * fix: remove unused tools and fix test failures - Remove track_agent_metrics and detect_anomalies tools completely - Remove corresponding test methods - Remove unused schemas from schema.ts - Fix syntax errors in tools.ts from incomplete removals - All 12 tests now pass successfully * fix: fix monitor_system_health tool implementation - Remove incorrect schema validation that expected systemId and metrics - Fix SQL syntax for Analytics Engine compatibility - Simplify to basic health check using github_stats dataset - Handle errors gracefully and return appropriate status * remove: remove monitor_system_health tool - Remove monitor_system_health tool from tools.ts - Remove monitorSystemHealth method from repository.ts - Remove MonitorSystemHealthSchema from schema.ts - Remove corresponding test from test suite - Tests now pass with 11 tests instead of 12 The tool was not providing meaningful value since it was just a basic query check. * fix: make get_metrics_summary respect dimensions parameter - Remove hardcoded 'daily_pr_stats' filter from SQL query - Add dynamic WHERE clause that uses dimensions parameter when provided - Now dimensions parameter actually filters the data as expected - If no dimensions provided, returns all data in time range - If dimensions provided, filters by blob2 IN (dimensions) This fixes the issue where different dimensions returned same results. * feat: standardize get_time_series to use dimensions parameter - Change get_time_series from 'filters' to 'dimensions' parameter for consistency with get_metrics_summary - Update tool definition, schema, repository method, and tests - Now both tools use the same 'dimensions: string[]' pattern - Simplifies API: dimensions: ['claude_rich_data'] vs filters: {event_type: 'claude_rich_data'} - Maintains same functionality with cleaner, consistent interface BREAKING CHANGE: get_time_series now uses 'dimensions' instead of 'filters' parameter * docs: update README for standardized get_time_series API - Update get_time_series documentation with new dimensions parameter - Add practical examples with claude_rich_data and github_stats - Clarify that get_metrics_summary and get_time_series no longer need code changes - Update section title to reflect current flexibility - Note that analyze_trends may still need adaptation The tools are now much more user-friendly with consistent dimensions parameter. * docs: fix query_analytics SQL examples in README - Remove invalid 'ORDER BY timestamp' - Analytics Engine doesn't expose timestamp column - Change to 'ORDER BY blob3 DESC' which uses the date field - Remove hardcoded WHERE filter to make examples more generic - Add 'as Date' alias for clarity - Examples now work without errors Fixes Analytics Engine API error: unable to find type of column: timestamp * feat: comprehensive Analytics MCP improvements ## Major Enhancements ### 📚 README Documentation - Standardize all examples to use 'github_stats' dataset (only bound dataset) - Add comprehensive dataset binding configuration guide - Enhance get_recent_data documentation with parameters and use cases - Fix analyze_trends documentation (remove incorrect 'hardcoded filters' claim) - Remove outdated monitor_system_health references - Fix SQL examples for Analytics Engine compatibility ### 🔧 Data Generation Script - Remove mock data fallbacks for transparency (fail clearly if GitHub API unavailable) - Restore realistic historical progression to current real GitHub values - Add accurate data source labeling: 'github_api_with_simulated_progression' - Fix organization name back to 'anthropics' (correct GitHub org) - Enhance error handling with clear failure messages ### ⚙️ Repository & Tools - Enhance list_datasets to show logical datasets grouped by event types - Fix Analytics Engine SQL compatibility (remove unsupported MIN/MAX on strings) - Improve dataset discovery (shows github_stats:claude_rich_data with record counts) ## Result - All 8 tools are now fully flexible with no hardcoded filters - All 11 tests passing - Documentation is accurate and user-friendly - Data generation is transparent with real GitHub API foundation * docs: streamline README - remove redundant Step 6 sections, update MCP Inspector setup, standardize dataset names, correct Analytics Engine limits * docs: dramatically simplify README - remove 8 major verbose sections - Remove Power of Built-in Time Series Tools - Remove What You'll Build - Remove Troubleshooting - Remove Verification Complete - Remove Next Steps - Remove Working Example Dashboard - Remove Demonstrated Features - Remove Technical Architecture Keep only essential setup and usage sections for cleaner, focused docs * docs: remove redundant steps and align numbering - Remove Step 7: Configure Analytics Engine Access - Remove Step 8: Verify Setup - Renumber Step 10 → Step 7: Create Dashboard Panels - Renumber Step 11 → Step 8: View Your Analytics Dashboard Steps now flow cleanly: 1→2→3→4→5→6→7→8 * docs: add comprehensive Complete Tool Examples Reference section - Add complete working examples for all 8 MCP tools: * track_metric - single data point tracking * track_batch_metrics - bulk data ingestion * query_analytics - custom SQL queries * get_metrics_summary - aggregated statistics * get_time_series - time series data for visualization * analyze_trends - trend detection and pattern analysis * list_datasets - dataset discovery and metadata * get_recent_data - recent records inspection - Include request/response examples for each tool - Add Quick Reference table for tool overview - Organize tools by category (Data Writing, Query & Analysis, Utility) - Use realistic GitHub PR analytics examples throughout - Provide clear descriptions and use cases for each tool * docs: break down track_metric example into separate fields for easy copy-paste - Split track_metric JSON into 3 separate fields: dataset, dimensions, metrics - Add individual JSON snippets for each field so users can copy just what they need - Keep complete request example for reference - Makes it easier for users to test individual components * feat: restore column field to analyze_trends tool - Add optional column parameter back to analyze_trends tool definition - Update AnalyzeTrendsSchema to include column field validation - Pass column parameter to repository.analyzeTrends method - Break down analyze_trends README example into individual fields - Column parameter allows users to specify which Analytics Engine column to analyze (double1, double2, etc.) - Defaults to double1 if not specified (auto-detection) - Fixes inconsistency where repository supported column but tool definition didn't expose it * feat: improve security and usability with wrangler secrets setup Security improvements: - Remove hardcoded CLOUDFLARE_ACCOUNT_ID from wrangler.jsonc - Protect personal account information in public repository Production deployment: - Add wrangler secret put instructions for both Account ID and API Token - Provide clear step-by-step setup before deployment - Use consistent secrets pattern for all credentials Documentation improvements: - Update tool count from 11 to 8 tools (accurate count) - Add real list_datasets response with 5 key datasets - Clarify localhost (.env) vs production (secrets) authentication - Break down tool examples into copy-paste friendly individual fields - Fix query_analytics SQL example (remove problematic blob3 alias) - Add User Details:Read permission for API token requirements Tool functionality: - Break down track_metric, track_batch_metrics, get_metrics_summary, get_time_series into individual fields - Update architecture note to reflect wrangler secrets usage * minor edits to cleanup a bit --------- Co-authored-by: Allen Wyma <>
1 parent 0f9dcc3 commit 1bad02a

File tree

17 files changed

+3187
-6
lines changed

17 files changed

+3187
-6
lines changed

examples/analytics-mcp/README.md

Lines changed: 766 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
#!/usr/bin/env node
2+
3+
/**
4+
* Generate batch test data for Analytics MCP using REAL GitHub API data
5+
* Collects actual repository statistics and creates 30 days of data split into batches
6+
*/
7+
8+
import fs from "fs";
9+
10+
// Configuration
11+
const CONFIG = {
12+
batchSize: 10,
13+
daysBack: 30,
14+
};
15+
16+
function getStartDate(daysBack = 30) {
17+
const startDate = new Date();
18+
startDate.setDate(now.getDate() - daysBack);
19+
return startDate;
20+
}
21+
22+
// GitHub API functions
23+
async function fetchGitHubRepoStats(owner, repo) {
24+
const url = `https://api.github.com/repos/${owner}/${repo}`;
25+
26+
console.log(`🔍 Fetching real data for ${owner}/${repo}...`);
27+
28+
try {
29+
const response = await fetch(url, {
30+
headers: {
31+
Accept: "application/vnd.github.v3+json",
32+
"User-Agent": "Analytics-MCP-Batch-Generator/1.0",
33+
// Optional: Add GitHub token for higher rate limits
34+
...(process.env.GITHUB_TOKEN && {
35+
Authorization: `token ${process.env.GITHUB_TOKEN}`,
36+
}),
37+
},
38+
});
39+
40+
if (!response.ok) {
41+
throw new Error(
42+
`GitHub API error for ${owner}/${repo}: ${response.status} ${response.statusText}`
43+
);
44+
}
45+
46+
const data = await response.json();
47+
console.log(
48+
`✅ Real data: ${data.stargazers_count} stars, ${data.forks_count} forks, ${data.open_issues_count} issues`
49+
);
50+
51+
return {
52+
stars: data.stargazers_count,
53+
forks: data.forks_count,
54+
watchers: data.watchers_count,
55+
open_issues: data.open_issues_count,
56+
};
57+
} catch (error) {
58+
throw new Error(
59+
`Failed to fetch GitHub data for ${owner}/${repo}: ${error.message}`
60+
);
61+
}
62+
}
63+
64+
async function generateDataPoints(owner, repo, days = 30) {
65+
// Fetch real GitHub data first
66+
const realStats = await fetchGitHubRepoStats(owner, repo);
67+
const repoName = `${owner}/${repo}`;
68+
69+
const dataPoints = [];
70+
const startDate = getStartDate(days);
71+
72+
for (let i = 0; i < days; i++) {
73+
const currentDate = new Date(startDate);
74+
currentDate.setDate(startDate.getDate() + i);
75+
const dateStr = currentDate.toISOString().split("T")[0];
76+
77+
// Generate realistic daily variations based on real current stats
78+
// Generate realistic historical progression leading to current real values
79+
// Note: GitHub API only provides current snapshots, so we simulate realistic daily progression
80+
const avgDailyStarGrowth = Math.max(1, Math.floor(realStats.stars / 365)); // Rough daily growth
81+
const avgDailyForkGrowth = Math.max(0, Math.floor(realStats.forks / 365));
82+
const avgDailyWatcherGrowth = Math.max(
83+
0,
84+
Math.floor(realStats.watchers / 365)
85+
);
86+
87+
// Calculate historical totals by working backwards from current real values
88+
const starsTotal = Math.max(
89+
1,
90+
realStats.stars - (29 - i) * avgDailyStarGrowth
91+
);
92+
const forksTotal = Math.max(
93+
1,
94+
realStats.forks - (29 - i) * avgDailyForkGrowth
95+
);
96+
const watchersTotal = Math.max(
97+
1,
98+
realStats.watchers - (29 - i) * avgDailyWatcherGrowth
99+
);
100+
101+
// Simulate daily activity (PRs, issues) based on repo size
102+
const repoActivityLevel = Math.min(
103+
10,
104+
Math.floor(realStats.stars / 5000) + 1
105+
);
106+
const prsCreated = Math.floor(Math.random() * repoActivityLevel) + 1;
107+
const prsMerged = Math.floor(prsCreated * (0.6 + Math.random() * 0.3)); // 60-90% merge rate
108+
const prsClosed = prsCreated - prsMerged;
109+
const issuesOpened = Math.floor(Math.random() * repoActivityLevel) + 1;
110+
const issuesClosed = Math.floor(issuesOpened * (0.7 + Math.random() * 0.2)); // 70-90% close rate
111+
112+
dataPoints.push({
113+
dimensions: {
114+
repo: repoName,
115+
event_type: "github_real_30days",
116+
date: dateStr,
117+
batch_id: `${owner}_${repo}_batch_${Date.now()}`,
118+
data_source: "github_api_with_simulated_progression",
119+
},
120+
metrics: {
121+
stars_total: starsTotal,
122+
forks_total: forksTotal,
123+
watchers_total: watchersTotal,
124+
prs_created: prsCreated,
125+
prs_merged: prsMerged,
126+
prs_closed: prsClosed,
127+
issues_opened: issuesOpened,
128+
issues_closed: issuesClosed,
129+
current_open_issues: realStats.open_issues,
130+
},
131+
timestamp: currentDate.getTime(),
132+
});
133+
}
134+
135+
return dataPoints;
136+
}
137+
138+
function splitIntoBatches(dataPoints, batchSize) {
139+
const batches = [];
140+
for (let i = 0; i < dataPoints.length; i += batchSize) {
141+
batches.push(dataPoints.slice(i, i + batchSize));
142+
}
143+
return batches;
144+
}
145+
146+
function saveDataFiles(owner, repo, batches) {
147+
const repoName = `${owner}/${repo}`;
148+
console.log(`\n📊 Generating data for ${repoName} (real GitHub data):`);
149+
150+
const prefix = `${owner}_${repo.replace("-", "_")}`;
151+
152+
batches.forEach((batch, index) => {
153+
const filename = `${prefix}_batch_${index + 1}.json`;
154+
fs.writeFileSync(filename, JSON.stringify(batch, null, 2));
155+
console.log(`✅ ${filename}: ${batch.length} records`);
156+
});
157+
158+
return { batchCount: batches.length, prefix };
159+
}
160+
161+
async function main() {
162+
const args = process.argv.slice(2);
163+
const [owner, repo] = args;
164+
165+
console.log("🚀 Analytics MCP Batch Data Generator (Real GitHub Data)");
166+
console.log("=========================================================");
167+
168+
if (process.env.GITHUB_TOKEN) {
169+
console.log("🔑 Using GitHub token for higher rate limits");
170+
} else {
171+
console.log(
172+
"⚠️ No GITHUB_TOKEN found - using anonymous requests (lower rate limits)"
173+
);
174+
}
175+
176+
// Validate arguments
177+
if (!owner || !repo) {
178+
console.error("❌ Usage: node generate-batch-data.js <owner> <repo>");
179+
console.log("\n📋 Examples:");
180+
console.log(" node generate-batch-data.js anthropics claude-code");
181+
console.log(
182+
" node generate-batch-data.js null-shot typescript-agent-framework"
183+
);
184+
console.log(" node generate-batch-data.js facebook react");
185+
console.log(" node generate-batch-data.js microsoft vscode");
186+
process.exit(1);
187+
}
188+
189+
try {
190+
const dataPoints = await generateDataPoints(owner, repo);
191+
const batches = splitIntoBatches(dataPoints, CONFIG.batchSize);
192+
193+
const result = saveDataFiles(owner, repo, batches);
194+
195+
console.log(
196+
`\n🎯 Generated ${dataPoints.length} data points in ${result.batchCount} batches`
197+
);
198+
console.log(
199+
`📁 Files: ${result.prefix}_batch_1.json to ${result.prefix}_batch_${result.batchCount}.json`
200+
);
201+
202+
console.log("\n📋 Next steps:");
203+
console.log("1. Open each batch file and copy the JSON array");
204+
console.log("2. Use track_batch_metrics in MCP Inspector (production SSE)");
205+
console.log("3. Process all batches (10 records each)");
206+
console.log("4. Verify with query_analytics tool");
207+
208+
console.log(
209+
"\n🎉 Success! Generated batch files with real GitHub API data"
210+
);
211+
} catch (error) {
212+
console.error(`❌ Error generating data: ${error.message}`);
213+
process.exit(1);
214+
}
215+
}
216+
217+
// Run main function if this file is executed directly
218+
if (import.meta.url === `file://${process.argv[1]}`) {
219+
main();
220+
}
221+
222+
export { generateDataPoints, splitIntoBatches };
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
{
2+
"name": "analytics-mcp",
3+
"version": "0.1.0",
4+
"private": true,
5+
"type": "module",
6+
"description": "Analytics MCP example demonstrating NullShot Analytics Engine integration",
7+
"main": "src/index.ts",
8+
"scripts": {
9+
"build": "wrangler build",
10+
"deploy": "wrangler deploy",
11+
"dev": "concurrently \"npx @modelcontextprotocol/inspector\" \"wrangler dev --port 8787\" --kill-others",
12+
"dev:vite": "vite dev",
13+
"start": "wrangler dev",
14+
"test": "vitest run",
15+
"cf-typegen": "wrangler types"
16+
},
17+
"keywords": [
18+
"mcp",
19+
"analytics",
20+
"nullshot",
21+
"cloudflare-workers",
22+
"time-series"
23+
],
24+
"author": "TypeScript Agent Framework",
25+
"license": "MIT",
26+
"dependencies": {
27+
"@modelcontextprotocol/sdk": "catalog:",
28+
"@nullshot/mcp": "workspace:*",
29+
"hono": "^4.7.6",
30+
"zod": "catalog:"
31+
},
32+
"devDependencies": {
33+
"@cloudflare/vitest-pool-workers": "catalog:",
34+
"@types/node": "catalog:",
35+
"@nullshot/test-utils": "workspace:*",
36+
"@types/chai": "^5.2.2",
37+
"chai": "^6.0.1",
38+
"concurrently": "^9.1.2",
39+
"typescript": "catalog:",
40+
"vitest": "catalog:",
41+
"wrangler": "catalog:"
42+
}
43+
}
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
import { AnalyticsMcpServer } from './server';
2+
3+
// Export the AnalyticsMcpServer class for Durable Object binding
4+
export { AnalyticsMcpServer };
5+
6+
// Worker entrypoint for handling incoming requests
7+
export default {
8+
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
9+
try {
10+
const url = new URL(request.url);
11+
const sessionIdStr = url.searchParams.get('sessionId');
12+
13+
// Generate or use existing session ID
14+
const id = sessionIdStr
15+
? env.ANALYTICS_MCP_SERVER.idFromString(sessionIdStr)
16+
: env.ANALYTICS_MCP_SERVER.newUniqueId();
17+
18+
console.log(`Analytics MCP: Processing request for session ${id.toString()}`);
19+
20+
// Add session ID to URL for the Durable Object
21+
url.searchParams.set('sessionId', id.toString());
22+
23+
// Forward request to the Durable Object
24+
const durableObject = env.ANALYTICS_MCP_SERVER.get(id);
25+
const response = await durableObject.fetch(new Request(url.toString(), request));
26+
27+
// Add CORS headers for browser compatibility
28+
const corsHeaders = {
29+
'Access-Control-Allow-Origin': '*',
30+
'Access-Control-Allow-Methods': 'GET, POST, PUT, DELETE, OPTIONS',
31+
'Access-Control-Allow-Headers': 'Content-Type, Authorization',
32+
'Access-Control-Max-Age': '86400'
33+
};
34+
35+
// Handle preflight requests
36+
if (request.method === 'OPTIONS') {
37+
return new Response(null, {
38+
status: 204,
39+
headers: corsHeaders
40+
});
41+
}
42+
43+
// Add CORS headers to response
44+
const newResponse = new Response(response.body, {
45+
status: response.status,
46+
statusText: response.statusText,
47+
headers: {
48+
...Object.fromEntries(response.headers.entries()),
49+
...corsHeaders
50+
}
51+
});
52+
53+
return newResponse;
54+
} catch (error) {
55+
console.error('Worker request handling error:', error);
56+
57+
return Response.json({
58+
error: 'Internal server error',
59+
message: error instanceof Error ? error.message : 'Unknown error occurred',
60+
timestamp: Date.now()
61+
}, {
62+
status: 500,
63+
headers: {
64+
'Access-Control-Allow-Origin': '*',
65+
'Content-Type': 'application/json'
66+
}
67+
});
68+
}
69+
}
70+
};
71+
72+
// Environment interface for TypeScript
73+
interface Env {
74+
ANALYTICS_MCP_SERVER: DurableObjectNamespace<AnalyticsMcpServer>;
75+
ANALYTICS: AnalyticsEngineDataset;
76+
DB: D1Database;
77+
}
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
2+
import { z } from 'zod';
3+
4+
export function setupServerPrompts(server: McpServer) {
5+
server.prompt(
6+
'analytics_introduction',
7+
'Introduction to analytics capabilities',
8+
() => ({
9+
messages: [
10+
{
11+
role: "assistant",
12+
content: {
13+
type: "text",
14+
text: "Welcome to Analytics MCP! Use tools like track_metric, query_analytics, and list_datasets to get started."
15+
}
16+
}
17+
]
18+
})
19+
);
20+
21+
server.prompt(
22+
'query_builder',
23+
'SQL query builder helper',
24+
{
25+
dataset: z.string().describe('Dataset name to query')
26+
},
27+
async (args) => ({
28+
messages: [
29+
{
30+
role: "assistant",
31+
content: {
32+
type: "text",
33+
text: `Query examples for ${args.dataset}: SELECT * FROM ${args.dataset} WHERE timestamp > NOW() - INTERVAL '24h'`
34+
}
35+
}
36+
]
37+
})
38+
);
39+
}

0 commit comments

Comments
 (0)