Implement pixie.wrap() API and eval process redesign#11
Conversation
- Add pixie/instrumentation/wrap_registry.py: context-var registries for input injection (_input_registry) and output capture (_capture_registry) with set/get/clear/init helpers - Add pixie/instrumentation/wrap_serialization.py: jsonpickle-based serialize_wrap_data() / deserialize_wrap_data() helpers - Add pixie/instrumentation/wrap.py: wrap() function with three modes: - No-op: returns data unchanged (default, no overhead) - Tracing (PIXIE_TRACING=1): emits OTel span events - Eval (input registry active): injects inputs, captures outputs/state WrapRegistryMissError raised when input key missing in eval mode. - Add pixie/instrumentation/trace_writer.py: thread-safe JSONL trace file writer (TraceFileWriter) for wrap events and LLM spans - Update pixie/config.py: add trace_output and tracing_enabled fields to PixieConfig; read PIXIE_TRACE_OUTPUT and PIXIE_TRACING env vars - Update pixie/instrumentation/__init__.py and pixie/__init__.py: export wrap, WrapRegistryMissError, WrapTypeMismatchError and all registry helper functions - Add tests: test_wrap_registry, test_wrap_serialization, test_wrap, test_config_tracing (36 new tests, 676 total passing) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: yiouli <3508494+yiouli@users.noreply.github.com>
- Add module docstring to wrap_registry.py - Fix 'behaviour' -> 'behavior' spelling in wrap.py module docstring - Remove unused 'os' import in test_wrap.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: yiouli <3508494+yiouli@users.noreply.github.com>
Agent-Logs-Url: https://github.com/yiouli/pixie-qa/sessions/b85ebaff-6dd1-48d0-9857-6ad238bd533a Co-authored-by: yiouli <3508494+yiouli@users.noreply.github.com>
Agent-Logs-Url: https://github.com/yiouli/pixie-qa/sessions/b85ebaff-6dd1-48d0-9857-6ad238bd533a Co-authored-by: yiouli <3508494+yiouli@users.noreply.github.com>
…, wrap docstring Agent-Logs-Url: https://github.com/yiouli/pixie-qa/sessions/b85ebaff-6dd1-48d0-9857-6ad238bd533a Co-authored-by: yiouli <3508494+yiouli@users.noreply.github.com>
pixie.wrap() API and eval process redesign
|
@copilot You are making the wrong assumption of the dataset format. The new dataset format would have the eval_input (app_input) in the exact same format of the trace logging - array of json objects for There's no backward compatibility; your implementation to get entry point input & dependency input are both wrong. Implementation-wise, because multiple places would need to load the jsonl ( Again because we don't need backward compatibility, you should just update the evaluable to directly have the list[LoggedData] for eval input. Also, because there's likely not going to be a parent span when And finally, you need to revamp the manual test fixture under tests/manual/, it should include:
The script should print detailed error message if any of its validation fails. You need to use that script to verify that your code is working. |
…nual fixtures Agent-Logs-Url: https://github.com/yiouli/pixie-qa/sessions/828ad56d-c97d-4f82-894b-c03ea1360241 Co-authored-by: yiouli <3508494+yiouli@users.noreply.github.com>
Agent-Logs-Url: https://github.com/yiouli/pixie-qa/sessions/828ad56d-c97d-4f82-894b-c03ea1360241 Co-authored-by: yiouli <3508494+yiouli@users.noreply.github.com>
Addressed all the feedback in commits
All 3 conversations pass at 1.00 score. 676 unit tests pass, mypy/ruff clean. |
Implements the
wrap()data-oriented observation API and the corresponding eval runner, registry, and serialization changes specified inspecs/wrap-api-and-eval-redesign.md.Core
wrap()APINew
pixie.wrap()function with three behavior modes:New files
pixie/instrumentation/wrap_registry.py—ContextVar-based input/capture registries with separate output and state registries; concurrent async eval runs get isolated statepixie/instrumentation/wrap_serialization.py— jsonpickle encode/decode helpers (type-preserving, not plain JSON)pixie/instrumentation/wrap.py—wrap()+WrapRegistryMissError/WrapTypeMismatchErrorpixie/instrumentation/wrap_log.py—WrapLogEntrypydantic model +load_wrap_log_entries()andfilter_by_purpose()utilities for loading JSONL trace data into typed objectspixie/instrumentation/trace_writer.py— thread-safe JSONL writer for wrap events and LLM spansConfig changes
Two new env vars read into
PixieConfig:PIXIE_TRACING— enables tracing mode (1/true/yes/on)PIXIE_TRACE_OUTPUT— path for JSONL trace output fileTrace writer integration
enable_storage()creates aTraceFileWriterwhen both vars are set.wrap()andLLMSpanProcessor.on_end()write to it when active.OTel event emission
_emit_wrap_event()adds events to the active span when one exists. When no recording span is active (the common case forwrap()calls outside OTel-instrumented code), it falls back to the Python logger, which is captured by OTel log exporters if configured.Dataset format
Dataset
eval_inputis an array of wrap log entry objects matching the JSONL trace file format exactly. Each entry hastype,name,purpose, anddatafields:{ "eval_input": [ {"type": "wrap", "name": "user_message", "purpose": "entry", "data": "Hello"}, {"type": "wrap", "name": "customer_profile", "purpose": "input", "data": {"id": "C001", "name": "Alice"}} ], "expected_output": "Hello Alice! How can I help you?" }The
WrapLogEntrypydantic model inpixie/instrumentation/wrap_log.pyprovides the shared typed representation used by both the trace filter CLI and the dataset/test runner.Evaluable model
eval_inputholds the array ofWrapLogEntry-shaped dictsget_wrap_inputs()helper parses and validates the wrap log entry formatcaptured_output: dict[str, JsonValue] | None— values fromwrap(purpose="output")captured_state: dict[str, JsonValue] | None— values fromwrap(purpose="state")Test runner
_run_entrydetects wrap-mode datasets viaget_wrap_inputs(). In wrap mode, entry-purpose items are aggregated by name into a dict passed to the runnable, and input-purpose items populate the input registry. Separate output and state capture registries ensureeval_outputreflects onlypurpose="output"data.CLI:
pixie trace filterFilters a JSONL trace file using
WrapLogEntrymodel for typed parsing; outputs matching entries as JSON.Manual test fixtures
tests/manual/chatbot.py— CLI chatbot with fake database lookup (tool calling), usingwrap()for every purpose type (entry,input,output,state) on both data values and callable providers. Thechat()entry point serves as the runnable for evaluation.tests/manual/verify_wrap_e2e.py— End-to-end verification script that:pixie teston the generated datasetOriginal prompt
Created from VS Code.