-
Notifications
You must be signed in to change notification settings - Fork 8
feat: add HTTP streaming, pivot tables and graph endpoints with Python integration #277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… streaming - Create findStream function with record-by-record processing - Extract common logic to findUtils.ts (DRY principle) - Create Transform streams for field permissions and date conversion - Add ObjectToJsonTransform for HTTP streaming - Add new endpoint /rest/stream/:document/findStream - Register streamApi in routes/index.ts - Add unit tests for Transform streams and findUtils - Add integration, E2E, and benchmark tests - Add confidence test to validate data consistency - All tests execute directly in Node (no Jest dependency) - Benchmark shows 82% memory reduction and 99% faster TTFB for 55k records TODO: Refactor and cleanup
- Extract magic numbers to streamConstants.ts (DRY)
- Replace let with const (const-pref)
- Replace forEach/for loops with functional methods (.map, .filter, .reduce)
- Extract helper functions from findUtils.ts (buildSortOptions, buildAccessConditionsForField, buildAccessConditionsMap, calculateConditionsKeys)
- Extract parseFilterFromQuery to eliminate duplication in streamApi.ts
- Create streamTestHelpers.ts with reusable test functions
- Use BluebirdPromise.map with concurrency limits in all promise operations
- Add default sort { _id: 1 } to findStream for consistent ordering
- Match find.ts behavior in findUtils.ts for query construction consistency
- Refactor test files to use helpers and functional methods
- Fix test variable references (testResults.allPassed)
All tests passing:
- Unit and integration tests: 7/7 passed
- Benchmark: 99.3% faster TTFB, 45% faster total time, 81.8% better throughput
- Confidence test: All datasets match exactly with find paginated endpoint
- Add comprehensive documentation for /rest/stream/:document/findStream endpoint - Document streaming format (newline-delimited JSON) - Include client-side processing examples (JavaScript) - Add advantages comparison with traditional find endpoint - Add usage guidelines and best practices - Update Postman collection with 3 new requests: - Find Stream (main request with all parameters) - Find Stream - Contact (simple example) - Find Stream - With Filter (complex filter example) - Documentation available in pt-BR and en - Include response examples and error handling
…ntation
- ADR-0001: HTTP Streaming para Busca de Dados
Documents decision to implement HTTP streaming endpoint
Includes performance metrics (68% memory reduction, 99.3% faster TTFB)
- ADR-0002: Extração de Lógica Comum para findUtils
Documents DRY principle application
Explains shared logic extraction between find and findStream
- ADR-0003: Node.js Transform Streams para Processamento Sequencial
Documents use of Transform streams for record-by-record processing
Explains pipeline architecture
- ADR-0004: Ordenação Padrão para Consistência
Documents default sorting decision ({ _id: 1 })
Explains consistency requirements for confidence tests
All ADRs available in pt-BR and en
Includes README files with index
- Add hasSecondaryNodes() function to check for available secondary nodes - Implement dynamic read preference selection: - Uses 'secondary' when secondaries are available (maximum isolation) - Falls back to 'secondaryPreferred' when no secondaries (no errors) - Add performance optimizations: - STREAM_BATCH_SIZE: 1000 documents per batch - STREAM_MAX_TIME_MS: 5 minutes max query time - Apply same read preference to countDocuments for consistency - Update ADR-0005 to reflect smart fallback approach - Works in all environments (dev without secondaries, prod with secondaries) See ADR-0005 for detailed rationale
- Remove KonectyResult (not used) - Remove errorReturn (not used) - Remove successReturn (not used) - Remove DataDocument (not used directly, only in streamTransforms) All imports are now used, lint passes without errors
- Add hierarchical pivot table structure with nested children - Enrich pivot config with metadata from MetaObject.Meta - Implement lookup field formatting with formatPattern - Add recursive field metadata resolution for nested lookups - Concatenate parent labels in nested fields (e.g., 'Grupo > Nome') - Calculate subtotals per hierarchy level - Calculate grand totals for all data - Update Python script to build hierarchical structure - Support Accept-Language header for multilingual labels - Update integration and unit tests for new structure Breaking changes: - Pivot API response format changed from flat array to hierarchical structure - Response now includes metadata, data (hierarchical), and grandTotals
…rmat - Update API documentation (en/pt-BR) with new hierarchical response structure - Add examples showing metadata, nested children, subtotals, and grandTotals - Document lookup formatting rules and nested field label concatenation - Add ADR-0007 documenting hierarchical pivot output format decision - Update ADR READMEs to include new ADR Breaking changes documented: - Response format changed from flat array to hierarchical structure - New metadata field with enriched field information - Nested children arrays for multi-level hierarchies - Subtotals per level and grand totals
- Update Postman collection example response to show new hierarchical structure - Include metadata, nested children, subtotals, and grandTotals in example - Reflect breaking change in response format
- Add Rust, cargo, and musl-dev for building polars from source on Alpine - Fix ENV format to use key=value syntax (removes warning) - Fix COPY paths to use absolute paths (/app instead of app) - Add python3-dev and py3-pip for Python development dependencies - Ensure konecty user has access to build tools - Note: polars will compile on first execution (takes ~2-5 minutes), then cached Alpine Linux (musl) doesn't have precompiled polars wheels, so compilation from source is required. This is handled automatically by uv when the script runs for the first time.
- Add GET /rest/data/:document/graph endpoint for SVG chart generation - Implement graphStream function orchestrating findStream + Python - Create graph_generator.py script using Polars for aggregations and pandas/matplotlib for visualization - Support 6 chart types: bar, line, pie, scatter, histogram, timeSeries - Add collectSVGFromPython function to pythonStreamBridge for SVG collection - Add GraphConfig and GraphStreamParams TypeScript types - Create unit and integration tests for graph endpoint - Add ADR-0008 documenting Polars+Pandas decision (pt-BR and en) - Update API documentation with graph endpoint examples (pt-BR and en) - Update Postman collection with graph examples using Opportunity document - Performance: Polars is 3-10x faster than Pandas for aggregations - Convert only aggregated results to Pandas (memory efficient) - Add pyarrow dependency for Polars to_pandas() conversion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
- Fix wrong variable name in runFindStreamTests.ts (failed++ -> testResults.failed++) - Change BENCHMARK_ITERATION_CONCURRENCY from 3 to 1 for accurate memory measurements - Fix TypeScript linting errors (any type, empty line, type guards)
|
✅ Fixed issues reported by Cursor Bugbot:
All issues have been resolved in commit a763b95. |
- Replace echo with printf in polars pre-build step - BusyBox ash doesn't interpret \n in echo, causing malformed input - printf correctly interprets \n as newline character - This ensures polars is properly pre-compiled during Docker build - Prevents multi-minute delay on first pivot/graph request
| cpuSystem: endCpu.system / MILLISECONDS_PER_SECOND, | ||
| recordCount, | ||
| throughput, | ||
| peakMemory: peakMemory - startMemory.heapUsed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double subtraction causes incorrect peak memory in benchmark
The readStreamRecordsWithMetrics helper function already returns peakMemory as a delta (computed as memoryState.peakMemory - startMemory.heapUsed at line 107 of streamTestHelpers.ts). However, benchmarkFindStream subtracts startMemory.heapUsed again at line 131, resulting in peakMemory - 2 * startMemory.heapUsed. This causes incorrect (likely negative) peak memory values for the stream endpoint benchmark, while benchmarkFindPaginated correctly computes the delta from the raw peak value. The fix is to use peakMemory directly without the second subtraction.
Additional Locations (1)
- Add support for hierarchical column headers in pivotStream response - Update tests to validate presence and structure of columnHeaders - Modify API documentation to reflect new columnHeaders field - Implement logic in Python script to handle and return column headers - Ensure backward compatibility with existing pivot functionality Breaking changes: - Response format now includes columnHeaders, enhancing the pivot table structure.
| "description": "Find Opportunity records with complex filter. Example filtering by multiple status values." | ||
| }, | ||
| "response": [] | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Malformed JSON structure in Postman collection item
Medium Severity
The "Find Stream - With Filter" item has incorrect indentation that breaks the JSON structure. Comparing with the correctly formatted "Find Stream - Contact" item (line 630), the description at line 658 and response at line 660 are indented one level less than required. This causes response to appear outside its parent item object, making the Postman collection invalid JSON that would fail to import.
|
|
||
| if (findStr !== streamStr) { | ||
| // Show first difference for debugging | ||
| return `${key}: find=${findStr.substring(0, MAX_SAMPLE_LENGTH)}... vs stream=${streamStr.substring(0, MAX_SAMPLE_LENGTH)}...`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling substring on undefined causes TypeError
Medium Severity
In compareRecordFields, when a key exists in one record but not the other, accessing the missing key returns undefined. Calling JSON.stringify(undefined) returns the primitive undefined (not a string). Then calling .substring() on lines 186 throws a TypeError: Cannot read property 'substring' of undefined. This crashes the confidence test whenever records have different fields.
Descrição
Este PR adiciona funcionalidades de streaming de dados, tabelas dinâmicas (pivot) e geração de gráficos, com integração Python para processamento eficiente de grandes volumes de dados.
Principais Funcionalidades
1. HTTP Streaming Endpoint (findStream)
/rest/stream/:document/findStreampara streaming HTTP verdadeiro2. Pivot Tables Endpoint
/rest/data/:document/pivotpara tabelas dinâmicascolumnHeaders) para suporte a colunas multi-nível3. Graph Endpoint
/rest/data/:document/graphpara geração de gráficos SVGCommits Incluídos
feat: [WIP] add HTTP streamable endpoint findStream- Endpoint base de streamingrefactor: apply clean code principles- Refatoração seguindo princípios clean codedocs: add findStream endpoint documentation- Documentação do findStreamdocs: add Architecture Decision Records (ADRs)- ADRs para decisões arquiteturaisfeat: implement smart secondary node usage- Uso inteligente de nós secundáriosfix: remove unused imports- Limpeza de imports não utilizadosfeat(pivot): implement hierarchical output- Implementação de pivot com saída hierárquicadocs(pivot): update API documentation and add ADR- Documentação do pivotdocs(postman): update pivot endpoint example- Atualização da collection Postmanfix(docker): update Dockerfile for Python support- Suporte Python no Dockerfeat: add graph endpoint with Polars and Pandas- Endpoint de gráficosdocs(pivot): update documentation and tests for columnHeaders- Atualização de documentação e testes para estrutura hierárquica de colunasArquivos Criados
Streaming
src/imports/data/api/findStream.tssrc/imports/data/api/findUtils.tssrc/imports/data/api/streamTransforms.tssrc/imports/data/api/streamConstants.tssrc/server/routes/rest/stream/streamApi.tsPivot
src/imports/data/api/pivotStream.tssrc/imports/data/api/pivotMetadata.tssrc/imports/types/pivot.tssrc/scripts/python/pivot_table.pyGraph
src/imports/data/api/graphStream.tssrc/imports/types/graph.tssrc/scripts/python/graph_generator.pyTestes
__test__/data/api/runFindStreamTests.ts__test__/data/api/runFindStreamBenchmark.ts__test__/data/api/runFindStreamConfidenceTest.ts__test__/data/api/runPivotIntegrationTest.ts__test__/data/api/runGraphIntegrationTest.ts__test__/data/api/pivotStream.test.ts__test__/data/api/graphStream.test.tsDocumentação
docs/pt-BR/adr/0001-http-streaming-para-busca-de-dados.mddocs/pt-BR/adr/0002-extracao-de-logica-comum-para-find-utils.mddocs/pt-BR/adr/0003-node-transform-streams-para-processamento-sequencial.mddocs/pt-BR/adr/0004-ordenacao-padrao-para-consistencia.mddocs/pt-BR/adr/0005-uso-obrigatorio-nos-secundarios-para-leitura.mddocs/pt-BR/adr/0006-integracao-python-para-pivot-tables.mddocs/pt-BR/adr/0007-formato-hierarquico-saida-pivot.mddocs/pt-BR/adr/0008-graph-endpoint-com-polars-pandas.mdArquivos Modificados
src/imports/data/api/index.tssrc/imports/data/api/pythonStreamBridge.tssrc/imports/utils/mongo.ts(hasSecondaryNodes)src/server/routes/rest/data/dataApi.tssrc/server/routes/index.tsDockerfile(suporte Python/uv)docs/pt-BR/api.mdedocs/en/api.md(atualizado com columnHeaders)docs/postman/Konecty-API.postman_collection.json__test__/data/api/pivotStream.test.ts(testes atualizados para columnHeaders)__test__/data/api/runPivotIntegrationTest.ts(testes de integração atualizados)docs/en/adr/0007-hierarchical-pivot-output-format.md(atualizado com columnHeaders)docs/pt-BR/adr/0007-formato-hierarquico-saida-pivot.md(atualizado com columnHeaders)Testes
columnHeadersPerformance
Documentação
columnHeadershierárquicosDependências Python
polars- Para agregações rápidas (pivot e graph)pandas- Para visualização (graph)matplotlib- Para geração de SVG (graph)pyarrow- Para conversão Polars → PandasTodas as dependências são gerenciadas automaticamente pelo
uvquando os scripts rodam pela primeira vez.Mudanças Recentes
Estrutura Hierárquica de Colunas (
columnHeaders)O endpoint de pivot tables agora retorna uma estrutura hierárquica de cabeçalhos de coluna (
columnHeaders) que suporta:A documentação e testes foram atualizados para refletir essas mudanças.
Note
Adds high-throughput data retrieval and analytics endpoints plus infra to support them.
GET /rest/stream/:document/findStream(NDJSON streaming),GET /rest/data/:document/pivot(hierarchical JSON),GET /rest/data/:document/graph(SVG)pythonStreamBridge; uses Polars (aggregation) and Pandas/matplotlib (charts)uv; prebuilds Polars and copies/app/scripts/pythonbuildFindQueryinfindUtils; transform streams instreamTransforms; default sort; secondary read preference with fallbackfindStream,pivotStream,graphStreamWritten by Cursor Bugbot for commit 3695247. This will update automatically on new commits. Configure here.