-
Couldn't load subscription status.
- Fork 467
[MLOB-4258] add support for OpenAI server-side MCP calls #15057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 237 ± 2 ms. The average import time from base is: 239 ± 2 ms. The import time difference between this PR and base is: -2.02 ± 0.09 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate nicole-cybul/openai-mcp-support (56e58c2) with baseline main (620007d) ❌ Test Failures (1 suite)❌ djangosimple - 27/30✅ appsecTime: ✅ 20.488ms (SLO: <22.300ms -8.1%) vs baseline: ~same Memory: ✅ 66.395MB (SLO: <67.000MB 🟡 -0.9%) vs baseline: +6.7% ✅ exception-replay-enabledTime: ✅ 1.345ms (SLO: <1.450ms -7.2%) vs baseline: -0.6% Memory: ✅ 64.582MB (SLO: <67.000MB -3.6%) vs baseline: +4.8% ✅ iastTime: ✅ 20.461ms (SLO: <22.250ms -8.0%) vs baseline: +0.3% Memory: ✅ 66.388MB (SLO: <67.000MB 🟡 -0.9%) vs baseline: +6.7% ✅ profilerTime: ✅ 15.485ms (SLO: <16.550ms -6.4%) vs baseline: +1.6% Memory: ✅ 54.008MB (SLO: <54.500MB 🟡 -0.9%) vs baseline: +5.4% ✅ resource-renamingTime: ✅ 20.499ms (SLO: <21.750ms -5.8%) vs baseline: -0.2% Memory: ✅ 66.427MB (SLO: <67.000MB 🟡 -0.9%) vs baseline: +6.8% ✅ span-code-originTime: ✅ 25.392ms (SLO: <28.200ms -10.0%) vs baseline: +0.3% Memory: ✅ 68.420MB (SLO: <69.500MB 🟡 -1.6%) vs baseline: +6.2% ✅ tracerTime: ✅ 20.454ms (SLO: <21.750ms -6.0%) vs baseline: ~same Memory: ✅ 66.338MB (SLO: <67.000MB 🟡 -1.0%) vs baseline: +6.6% ❌ tracer-and-profilerTime: ✅ 22.733ms (SLO: <23.500ms -3.3%) vs baseline: +2.9% Memory: ❌ 67.868MB (SLO: <67.500MB +0.5%) vs baseline: +6.7% ❌ tracer-dont-create-db-spansTime: ✅ 19.291ms (SLO: <21.500ms 📉 -10.3%) vs baseline: -0.5% Memory: ❌ 66.452MB (SLO: <66.000MB +0.7%) vs baseline: +6.8% ❌ tracer-minimalTime: ✅ 16.598ms (SLO: <17.500ms -5.2%) vs baseline: +0.4% Memory: ❌ 66.422MB (SLO: <66.000MB +0.6%) vs baseline: +6.8% ✅ tracer-nativeTime: ✅ 20.508ms (SLO: <21.750ms -5.7%) vs baseline: +0.3% Memory: ✅ 72.194MB (SLO: <72.500MB 🟡 -0.4%) vs baseline: +6.3% ✅ tracer-no-cachesTime: ✅ 18.426ms (SLO: <19.650ms -6.2%) vs baseline: -0.6% Memory: ✅ 66.447MB (SLO: <67.000MB 🟡 -0.8%) vs baseline: +6.7% ✅ tracer-no-databasesTime: ✅ 18.850ms (SLO: <20.100ms -6.2%) vs baseline: +0.5% Memory: ✅ 66.399MB (SLO: <67.000MB 🟡 -0.9%) vs baseline: +6.8% ✅ tracer-no-middlewareTime: ✅ 20.197ms (SLO: <21.500ms -6.1%) vs baseline: +0.6% Memory: ✅ 66.465MB (SLO: <67.000MB 🟡 -0.8%) vs baseline: +6.8% ✅ tracer-no-templatesTime: ✅ 20.282ms (SLO: <22.000ms -7.8%) vs baseline: ~same Memory: ✅ 66.426MB (SLO: <67.000MB 🟡 -0.9%) vs baseline: +6.8% 📈 Performance Regressions (2 suites)📈 iast_aspects - 40/40✅ re_expand_aspectTime: ✅ 31.823µs (SLO: <40.000µs 📉 -20.4%) vs baseline: -0.5% Memory: ✅ 38.103MB (SLO: <39.000MB -2.3%) vs baseline: +6.0% ✅ re_expand_noaspectTime: ✅ 28.607µs (SLO: <40.000µs 📉 -28.5%) vs baseline: +0.4% Memory: ✅ 38.103MB (SLO: <39.000MB -2.3%) vs baseline: +6.0% ✅ re_findall_aspectTime: ✅ 2.905µs (SLO: <10.000µs 📉 -71.0%) vs baseline: ~same Memory: ✅ 38.122MB (SLO: <39.000MB -2.3%) vs baseline: +6.3% ✅ re_findall_noaspectTime: ✅ 1.446µs (SLO: <10.000µs 📉 -85.5%) vs baseline: +2.0% Memory: ✅ 38.063MB (SLO: <39.000MB -2.4%) vs baseline: +5.9% ✅ re_finditer_aspectTime: ✅ 4.437µs (SLO: <10.000µs 📉 -55.6%) vs baseline: ~same Memory: ✅ 38.103MB (SLO: <39.000MB -2.3%) vs baseline: +6.1% ✅ re_finditer_noaspectTime: ✅ 1.391µs (SLO: <10.000µs 📉 -86.1%) vs baseline: -2.0% Memory: ✅ 38.063MB (SLO: <39.000MB -2.4%) vs baseline: +5.8% ✅ re_fullmatch_aspectTime: ✅ 2.726µs (SLO: <10.000µs 📉 -72.7%) vs baseline: +2.8% Memory: ✅ 38.142MB (SLO: <39.000MB -2.2%) vs baseline: +5.9% ✅ re_fullmatch_noaspectTime: ✅ 1.285µs (SLO: <10.000µs 📉 -87.2%) vs baseline: -0.9% Memory: ✅ 38.063MB (SLO: <39.000MB -2.4%) vs baseline: +5.8% ✅ re_group_aspectTime: ✅ 2.959µs (SLO: <10.000µs 📉 -70.4%) vs baseline: +1.7% Memory: ✅ 38.122MB (SLO: <39.000MB -2.3%) vs baseline: +6.0% ✅ re_group_noaspectTime: ✅ 1.622µs (SLO: <10.000µs 📉 -83.8%) vs baseline: +1.6% Memory: ✅ 38.063MB (SLO: <39.000MB -2.4%) vs baseline: +5.9% ✅ re_groups_aspectTime: ✅ 3.064µs (SLO: <10.000µs 📉 -69.4%) vs baseline: -0.2% Memory: ✅ 38.063MB (SLO: <39.000MB -2.4%) vs baseline: +6.0% ✅ re_groups_noaspectTime: ✅ 1.691µs (SLO: <10.000µs 📉 -83.1%) vs baseline: -0.9% Memory: ✅ 38.122MB (SLO: <39.000MB -2.3%) vs baseline: +6.3% ✅ re_match_aspectTime: ✅ 2.971µs (SLO: <10.000µs 📉 -70.3%) vs baseline: 📈 +10.7% Memory: ✅ 38.103MB (SLO: <39.000MB -2.3%) vs baseline: +6.3% ✅ re_match_noaspectTime: ✅ 1.299µs (SLO: <10.000µs 📉 -87.0%) vs baseline: +0.1% Memory: ✅ 38.122MB (SLO: <39.000MB -2.3%) vs baseline: +6.1% ✅ re_search_aspectTime: ✅ 2.747µs (SLO: <10.000µs 📉 -72.5%) vs baseline: +8.6% Memory: ✅ 38.103MB (SLO: <39.000MB -2.3%) vs baseline: +6.2% ✅ re_search_noaspectTime: ✅ 1.203µs (SLO: <10.000µs 📉 -88.0%) vs baseline: -0.3% Memory: ✅ 38.063MB (SLO: <39.000MB -2.4%) vs baseline: +5.9% ✅ re_sub_aspectTime: ✅ 3.388µs (SLO: <10.000µs 📉 -66.1%) vs baseline: -1.6% Memory: ✅ 38.083MB (SLO: <39.000MB -2.4%) vs baseline: +6.0% ✅ re_sub_noaspectTime: ✅ 1.529µs (SLO: <10.000µs 📉 -84.7%) vs baseline: +0.3% Memory: ✅ 38.083MB (SLO: <39.000MB -2.4%) vs baseline: +6.1% ✅ re_subn_aspectTime: ✅ 3.686µs (SLO: <10.000µs 📉 -63.1%) vs baseline: +0.6% Memory: ✅ 38.063MB (SLO: <39.000MB -2.4%) vs baseline: +5.9% ✅ re_subn_noaspectTime: ✅ 1.606µs (SLO: <10.000µs 📉 -83.9%) vs baseline: -0.1% Memory: ✅ 38.103MB (SLO: <39.000MB -2.3%) vs baseline: +6.0% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.253µs (SLO: <20.000µs 📉 -83.7%) vs baseline: 📈 +10.9% Memory: ✅ 32.165MB (SLO: <34.000MB -5.4%) vs baseline: +4.9% ✅ 1-count-metrics-100-timesTime: ✅ 200.605µs (SLO: <220.000µs -8.8%) vs baseline: -0.4% Memory: ✅ 32.204MB (SLO: <34.000MB -5.3%) vs baseline: +5.2% ✅ 1-distribution-metric-1-timesTime: ✅ 3.402µs (SLO: <20.000µs 📉 -83.0%) vs baseline: +3.5% Memory: ✅ 32.185MB (SLO: <34.000MB -5.3%) vs baseline: +4.9% ✅ 1-distribution-metrics-100-timesTime: ✅ 213.442µs (SLO: <220.000µs -3.0%) vs baseline: -1.6% Memory: ✅ 32.224MB (SLO: <34.000MB -5.2%) vs baseline: +5.3% ✅ 1-gauge-metric-1-timesTime: ✅ 2.194µs (SLO: <20.000µs 📉 -89.0%) vs baseline: -0.2% Memory: ✅ 32.224MB (SLO: <34.000MB -5.2%) vs baseline: +5.0% ✅ 1-gauge-metrics-100-timesTime: ✅ 138.753µs (SLO: <150.000µs -7.5%) vs baseline: +0.8% Memory: ✅ 32.145MB (SLO: <34.000MB -5.5%) vs baseline: +4.9% ✅ 1-rate-metric-1-timesTime: ✅ 3.155µs (SLO: <20.000µs 📉 -84.2%) vs baseline: +3.5% Memory: ✅ 32.165MB (SLO: <34.000MB -5.4%) vs baseline: +5.1% ✅ 1-rate-metrics-100-timesTime: ✅ 216.242µs (SLO: <250.000µs 📉 -13.5%) vs baseline: -0.8% Memory: ✅ 32.126MB (SLO: <34.000MB -5.5%) vs baseline: +4.7% ✅ 100-count-metrics-100-timesTime: ✅ 21.137ms (SLO: <22.000ms -3.9%) vs baseline: +4.1% Memory: ✅ 32.185MB (SLO: <34.000MB -5.3%) vs baseline: +5.1% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.301ms (SLO: <2.300ms ~same) vs baseline: +1.8% Memory: ✅ 32.185MB (SLO: <34.000MB -5.3%) vs baseline: +5.0% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.434ms (SLO: <1.550ms -7.5%) vs baseline: +1.8% Memory: ✅ 32.165MB (SLO: <34.000MB -5.4%) vs baseline: +5.1% ✅ 100-rate-metrics-100-timesTime: ✅ 2.295ms (SLO: <2.550ms -10.0%) vs baseline: +3.2% Memory: ✅ 32.204MB (SLO: <34.000MB -5.3%) vs baseline: +5.0% ✅ flush-1-metricTime: ✅ 4.686µs (SLO: <20.000µs 📉 -76.6%) vs baseline: +4.7% Memory: ✅ 32.204MB (SLO: <34.000MB -5.3%) vs baseline: +5.4% ✅ flush-100-metricsTime: ✅ 174.410µs (SLO: <250.000µs 📉 -30.2%) vs baseline: ~same Memory: ✅ 32.165MB (SLO: <34.000MB -5.4%) vs baseline: +5.1% ✅ flush-1000-metricsTime: ✅ 2.119ms (SLO: <2.500ms 📉 -15.3%) vs baseline: -0.1% Memory: ✅ 32.991MB (SLO: <34.500MB -4.4%) vs baseline: +5.2% 🟡 Near SLO Breach (5 suites)🟡 errortrackingdjangosimple - 6/6✅ errortracking-enabled-allTime: ✅ 18.219ms (SLO: <19.850ms -8.2%) vs baseline: +1.2% Memory: ✅ 66.436MB (SLO: <66.500MB 🟡 ~same) vs baseline: +6.9% ✅ errortracking-enabled-userTime: ✅ 18.098ms (SLO: <19.400ms -6.7%) vs baseline: +0.4% Memory: ✅ 66.456MB (SLO: <66.500MB 🟡 ~same) vs baseline: +6.9% ✅ tracer-enabledTime: ✅ 18.097ms (SLO: <19.450ms -7.0%) vs baseline: +0.4% Memory: ✅ 66.349MB (SLO: <66.500MB 🟡 -0.2%) vs baseline: +6.6% 🟡 errortrackingflasksqli - 6/6✅ errortracking-enabled-allTime: ✅ 2.072ms (SLO: <2.300ms -9.9%) vs baseline: ~same Memory: ✅ 52.750MB (SLO: <53.500MB 🟡 -1.4%) vs baseline: +6.1% ✅ errortracking-enabled-userTime: ✅ 2.072ms (SLO: <2.250ms -7.9%) vs baseline: +0.2% Memory: ✅ 52.848MB (SLO: <53.500MB 🟡 -1.2%) vs baseline: +6.4% ✅ tracer-enabledTime: ✅ 2.073ms (SLO: <2.300ms -9.9%) vs baseline: +0.2% Memory: ✅ 52.770MB (SLO: <53.500MB 🟡 -1.4%) vs baseline: +6.0% 🟡 flasksimple - 18/18✅ appsec-getTime: ✅ 4.603ms (SLO: <4.750ms -3.1%) vs baseline: +0.3% Memory: ✅ 62.757MB (SLO: <65.000MB -3.5%) vs baseline: +6.2% ✅ appsec-postTime: ✅ 6.640ms (SLO: <6.750ms 🟡 -1.6%) vs baseline: +0.1% Memory: ✅ 62.757MB (SLO: <65.000MB -3.5%) vs baseline: +6.2% ✅ appsec-telemetryTime: ✅ 4.593ms (SLO: <4.750ms -3.3%) vs baseline: +0.2% Memory: ✅ 62.698MB (SLO: <65.000MB -3.5%) vs baseline: +6.0% ✅ debuggerTime: ✅ 1.860ms (SLO: <2.000ms -7.0%) vs baseline: ~same Memory: ✅ 45.279MB (SLO: <47.000MB -3.7%) vs baseline: +4.5% ✅ iast-getTime: ✅ 1.856ms (SLO: <2.000ms -7.2%) vs baseline: -0.7% Memory: ✅ 42.428MB (SLO: <49.000MB 📉 -13.4%) vs baseline: +5.1% ✅ profilerTime: ✅ 1.913ms (SLO: <2.100ms -8.9%) vs baseline: +0.1% Memory: ✅ 46.419MB (SLO: <47.000MB 🟡 -1.2%) vs baseline: +4.7% ✅ resource-renamingTime: ✅ 3.372ms (SLO: <3.650ms -7.6%) vs baseline: ~same Memory: ✅ 52.986MB (SLO: <53.500MB 🟡 -1.0%) vs baseline: +6.4% ✅ tracerTime: ✅ 3.361ms (SLO: <3.650ms -7.9%) vs baseline: -0.2% Memory: ✅ 52.947MB (SLO: <53.500MB 🟡 -1.0%) vs baseline: +6.3% ✅ tracer-nativeTime: ✅ 3.367ms (SLO: <3.650ms -7.8%) vs baseline: +0.3% Memory: ✅ 58.876MB (SLO: <60.000MB 🟡 -1.9%) vs baseline: +6.0% 🟡 flasksqli - 6/6✅ appsec-enabledTime: ✅ 3.964ms (SLO: <4.200ms -5.6%) vs baseline: +0.5% Memory: ✅ 62.954MB (SLO: <66.000MB -4.6%) vs baseline: +6.0% ✅ iast-enabledTime: ✅ 2.445ms (SLO: <2.800ms 📉 -12.7%) vs baseline: +0.5% Memory: ✅ 59.395MB (SLO: <60.000MB 🟡 -1.0%) vs baseline: +6.1% ✅ tracer-enabledTime: ✅ 2.065ms (SLO: <2.250ms -8.2%) vs baseline: +0.7% Memory: ✅ 52.868MB (SLO: <54.500MB -3.0%) vs baseline: +6.3% 🟡 packagespackageforrootmodulemapping - 4/4✅ cache_offTime: ✅ 342.980ms (SLO: <354.300ms -3.2%) vs baseline: -0.8% Memory: ✅ 39.999MB (SLO: <40.000MB 🟡 ~same) vs baseline: +7.2% ✅ cache_onTime: ✅ 0.387µs (SLO: <10.000µs 📉 -96.1%) vs baseline: +1.4% Memory: ✅ 37.550MB (SLO: <39.000MB -3.7%) vs baseline: +4.5%
|
Description
This PR adds support for server-side MCP calls made via the OpenAI Responses API.
In the Responses API, LLMs can invoke MCP tools on behalf of the client. They do this by asking the provided MCP server to list available tools and then calling the relevant tool.
Our current support for these kinds of interactions is not great: we do not capture any tool calls, tool results, or tool spans.
This PR provides better support by:
McpCalloutput item and parsing it into a Tool Call and Tool Result for the current active LLM spanTesting
Risks
Additional Notes