feat: add HTTP streaming, pivot tables and graph endpoints with Python integration #277

silveirado · 2025-12-30T18:20:37Z

Descrição

Este PR adiciona funcionalidades de streaming de dados, tabelas dinâmicas (pivot) e geração de gráficos, com integração Python para processamento eficiente de grandes volumes de dados.

Principais Funcionalidades

1. HTTP Streaming Endpoint (findStream)

✅ Novo endpoint /rest/stream/:document/findStream para streaming HTTP verdadeiro
✅ Processamento registro a registro sem acumular dados em memória
✅ Aplicação de permissões e transformações em tempo real
✅ Uso inteligente de nós secundários do MongoDB com fallback
✅ Ordenação padrão para consistência

2. Pivot Tables Endpoint

✅ Novo endpoint /rest/data/:document/pivot para tabelas dinâmicas
✅ Integração Python com Polars para processamento rápido
✅ Formato hierárquico de saída com metadados enriquecidos
✅ Suporte a campos aninhados e formatação de lookups
✅ Labels multilíngue (pt-BR/en)
✅ Estrutura hierárquica de colunas (columnHeaders) para suporte a colunas multi-nível
✅ Suporte a date buckets (D, W, M, Q, Y) para agregação temporal
✅ Documentação completa atualizada com exemplos de colunas hierárquicas

3. Graph Endpoint

✅ Novo endpoint /rest/data/:document/graph para geração de gráficos SVG
✅ Integração Python usando Polars para agregações (3-10x mais rápido)
✅ Uso de pandas/matplotlib para visualização
✅ Suporte a 6 tipos de gráficos: bar, line, pie, scatter, histogram, timeSeries
✅ Processamento interno com streaming (findStream)

Commits Incluídos

feat: [WIP] add HTTP streamable endpoint findStream - Endpoint base de streaming
refactor: apply clean code principles - Refatoração seguindo princípios clean code
docs: add findStream endpoint documentation - Documentação do findStream
docs: add Architecture Decision Records (ADRs) - ADRs para decisões arquiteturais
feat: implement smart secondary node usage - Uso inteligente de nós secundários
fix: remove unused imports - Limpeza de imports não utilizados
feat(pivot): implement hierarchical output - Implementação de pivot com saída hierárquica
docs(pivot): update API documentation and add ADR - Documentação do pivot
docs(postman): update pivot endpoint example - Atualização da collection Postman
fix(docker): update Dockerfile for Python support - Suporte Python no Docker
feat: add graph endpoint with Polars and Pandas - Endpoint de gráficos
docs(pivot): update documentation and tests for columnHeaders - Atualização de documentação e testes para estrutura hierárquica de colunas

Arquivos Criados

Streaming

src/imports/data/api/findStream.ts
src/imports/data/api/findUtils.ts
src/imports/data/api/streamTransforms.ts
src/imports/data/api/streamConstants.ts
src/server/routes/rest/stream/streamApi.ts

Pivot

src/imports/data/api/pivotStream.ts
src/imports/data/api/pivotMetadata.ts
src/imports/types/pivot.ts
src/scripts/python/pivot_table.py

Graph

src/imports/data/api/graphStream.ts
src/imports/types/graph.ts
src/scripts/python/graph_generator.py

Testes

__test__/data/api/runFindStreamTests.ts
__test__/data/api/runFindStreamBenchmark.ts
__test__/data/api/runFindStreamConfidenceTest.ts
__test__/data/api/runPivotIntegrationTest.ts
__test__/data/api/runGraphIntegrationTest.ts
__test__/data/api/pivotStream.test.ts
__test__/data/api/graphStream.test.ts

Documentação

docs/pt-BR/adr/0001-http-streaming-para-busca-de-dados.md
docs/pt-BR/adr/0002-extracao-de-logica-comum-para-find-utils.md
docs/pt-BR/adr/0003-node-transform-streams-para-processamento-sequencial.md
docs/pt-BR/adr/0004-ordenacao-padrao-para-consistencia.md
docs/pt-BR/adr/0005-uso-obrigatorio-nos-secundarios-para-leitura.md
docs/pt-BR/adr/0006-integracao-python-para-pivot-tables.md
docs/pt-BR/adr/0007-formato-hierarquico-saida-pivot.md
docs/pt-BR/adr/0008-graph-endpoint-com-polars-pandas.md
(Versões em inglês de todos os ADRs)

Arquivos Modificados

src/imports/data/api/index.ts
src/imports/data/api/pythonStreamBridge.ts
src/imports/utils/mongo.ts (hasSecondaryNodes)
src/server/routes/rest/data/dataApi.ts
src/server/routes/index.ts
Dockerfile (suporte Python/uv)
docs/pt-BR/api.md e docs/en/api.md (atualizado com columnHeaders)
docs/postman/Konecty-API.postman_collection.json
__test__/data/api/pivotStream.test.ts (testes atualizados para columnHeaders)
__test__/data/api/runPivotIntegrationTest.ts (testes de integração atualizados)
docs/en/adr/0007-hierarchical-pivot-output-format.md (atualizado com columnHeaders)
docs/pt-BR/adr/0007-formato-hierarquico-saida-pivot.md (atualizado com columnHeaders)

Testes

✅ Testes unitários para findStream, pivotStream e graphStream
✅ Testes de integração para todos os endpoints
✅ Testes de benchmark comparando performance
✅ Testes de confiança validando consistência de dados
✅ Testes atualizados para verificar estrutura hierárquica de columnHeaders
✅ Build TypeScript sem erros

Performance

findStream: Streaming verdadeiro, sem acumular dados em memória
Pivot: Polars para processamento rápido de grandes volumes
Graph: Polars 3-10x mais rápido que Pandas para agregações
MongoDB: Uso inteligente de nós secundários com fallback

Documentação

✅ 8 ADRs documentando decisões arquiteturais (pt-BR e en)
✅ Documentação completa da API para todos os endpoints
✅ Documentação atualizada com exemplos de columnHeaders hierárquicos
✅ Exemplos de colunas multi-nível (date buckets com status)
✅ Collection Postman atualizada com exemplos reais
✅ Exemplos de uso para cada funcionalidade

Dependências Python

polars - Para agregações rápidas (pivot e graph)
pandas - Para visualização (graph)
matplotlib - Para geração de SVG (graph)
pyarrow - Para conversão Polars → Pandas

Todas as dependências são gerenciadas automaticamente pelo uv quando os scripts rodam pela primeira vez.

Mudanças Recentes

Estrutura Hierárquica de Colunas (`columnHeaders`)

O endpoint de pivot tables agora retorna uma estrutura hierárquica de cabeçalhos de coluna (columnHeaders) que suporta:

Colunas multi-nível (ex: date buckets com status)
Formatação automática de lookups
Suporte a date buckets (D=dia, W=semana, M=mês, Q=trimestre, Y=ano)
Estrutura similar ao ExtJS mz-pivot axisTop

A documentação e testes foram atualizados para refletir essas mudanças.

Note

Adds high-throughput data retrieval and analytics endpoints plus infra to support them.

New endpoints: GET /rest/stream/:document/findStream (NDJSON streaming), GET /rest/data/:document/pivot (hierarchical JSON), GET /rest/data/:document/graph (SVG)
Python integration: Orchestrates Node → Python via pythonStreamBridge; uses Polars (aggregation) and Pandas/matplotlib (charts)
Dockerfile: Installs Python, Rust, uv; prebuilds Polars and copies /app/scripts/python
Query/streaming core: Shared buildFindQuery in findUtils; transform streams in streamTransforms; default sort; secondary read preference with fallback
Tests: Unit/integration/e2e, confidence and benchmark runners for findStream, pivotStream, graphStream
Docs: API docs expanded; ADRs added (streaming, transforms, findUtils, default sorting, secondary reads, Python pivot, hierarchical pivot, graphs); Postman collection updated

^{Written by Cursor Bugbot for commit 3695247. This will update automatically on new commits. Configure here.}

… streaming - Create findStream function with record-by-record processing - Extract common logic to findUtils.ts (DRY principle) - Create Transform streams for field permissions and date conversion - Add ObjectToJsonTransform for HTTP streaming - Add new endpoint /rest/stream/:document/findStream - Register streamApi in routes/index.ts - Add unit tests for Transform streams and findUtils - Add integration, E2E, and benchmark tests - Add confidence test to validate data consistency - All tests execute directly in Node (no Jest dependency) - Benchmark shows 82% memory reduction and 99% faster TTFB for 55k records TODO: Refactor and cleanup

- Extract magic numbers to streamConstants.ts (DRY) - Replace let with const (const-pref) - Replace forEach/for loops with functional methods (.map, .filter, .reduce) - Extract helper functions from findUtils.ts (buildSortOptions, buildAccessConditionsForField, buildAccessConditionsMap, calculateConditionsKeys) - Extract parseFilterFromQuery to eliminate duplication in streamApi.ts - Create streamTestHelpers.ts with reusable test functions - Use BluebirdPromise.map with concurrency limits in all promise operations - Add default sort { _id: 1 } to findStream for consistent ordering - Match find.ts behavior in findUtils.ts for query construction consistency - Refactor test files to use helpers and functional methods - Fix test variable references (testResults.allPassed) All tests passing: - Unit and integration tests: 7/7 passed - Benchmark: 99.3% faster TTFB, 45% faster total time, 81.8% better throughput - Confidence test: All datasets match exactly with find paginated endpoint

- Add comprehensive documentation for /rest/stream/:document/findStream endpoint - Document streaming format (newline-delimited JSON) - Include client-side processing examples (JavaScript) - Add advantages comparison with traditional find endpoint - Add usage guidelines and best practices - Update Postman collection with 3 new requests: - Find Stream (main request with all parameters) - Find Stream - Contact (simple example) - Find Stream - With Filter (complex filter example) - Documentation available in pt-BR and en - Include response examples and error handling

…ntation - ADR-0001: HTTP Streaming para Busca de Dados Documents decision to implement HTTP streaming endpoint Includes performance metrics (68% memory reduction, 99.3% faster TTFB) - ADR-0002: Extração de Lógica Comum para findUtils Documents DRY principle application Explains shared logic extraction between find and findStream - ADR-0003: Node.js Transform Streams para Processamento Sequencial Documents use of Transform streams for record-by-record processing Explains pipeline architecture - ADR-0004: Ordenação Padrão para Consistência Documents default sorting decision ({ _id: 1 }) Explains consistency requirements for confidence tests All ADRs available in pt-BR and en Includes README files with index

- Add hasSecondaryNodes() function to check for available secondary nodes - Implement dynamic read preference selection: - Uses 'secondary' when secondaries are available (maximum isolation) - Falls back to 'secondaryPreferred' when no secondaries (no errors) - Add performance optimizations: - STREAM_BATCH_SIZE: 1000 documents per batch - STREAM_MAX_TIME_MS: 5 minutes max query time - Apply same read preference to countDocuments for consistency - Update ADR-0005 to reflect smart fallback approach - Works in all environments (dev without secondaries, prod with secondaries) See ADR-0005 for detailed rationale

- Remove KonectyResult (not used) - Remove errorReturn (not used) - Remove successReturn (not used) - Remove DataDocument (not used directly, only in streamTransforms) All imports are now used, lint passes without errors

- Add hierarchical pivot table structure with nested children - Enrich pivot config with metadata from MetaObject.Meta - Implement lookup field formatting with formatPattern - Add recursive field metadata resolution for nested lookups - Concatenate parent labels in nested fields (e.g., 'Grupo > Nome') - Calculate subtotals per hierarchy level - Calculate grand totals for all data - Update Python script to build hierarchical structure - Support Accept-Language header for multilingual labels - Update integration and unit tests for new structure Breaking changes: - Pivot API response format changed from flat array to hierarchical structure - Response now includes metadata, data (hierarchical), and grandTotals

…rmat - Update API documentation (en/pt-BR) with new hierarchical response structure - Add examples showing metadata, nested children, subtotals, and grandTotals - Document lookup formatting rules and nested field label concatenation - Add ADR-0007 documenting hierarchical pivot output format decision - Update ADR READMEs to include new ADR Breaking changes documented: - Response format changed from flat array to hierarchical structure - New metadata field with enriched field information - Nested children arrays for multi-level hierarchies - Subtotals per level and grand totals

- Update Postman collection example response to show new hierarchical structure - Include metadata, nested children, subtotals, and grandTotals in example - Reflect breaking change in response format

- Add Rust, cargo, and musl-dev for building polars from source on Alpine - Fix ENV format to use key=value syntax (removes warning) - Fix COPY paths to use absolute paths (/app instead of app) - Add python3-dev and py3-pip for Python development dependencies - Ensure konecty user has access to build tools - Note: polars will compile on first execution (takes ~2-5 minutes), then cached Alpine Linux (musl) doesn't have precompiled polars wheels, so compilation from source is required. This is handled automatically by uv when the script runs for the first time.

- Add GET /rest/data/:document/graph endpoint for SVG chart generation - Implement graphStream function orchestrating findStream + Python - Create graph_generator.py script using Polars for aggregations and pandas/matplotlib for visualization - Support 6 chart types: bar, line, pie, scatter, histogram, timeSeries - Add collectSVGFromPython function to pythonStreamBridge for SVG collection - Add GraphConfig and GraphStreamParams TypeScript types - Create unit and integration tests for graph endpoint - Add ADR-0008 documenting Polars+Pandas decision (pt-BR and en) - Update API documentation with graph endpoint examples (pt-BR and en) - Update Postman collection with graph examples using Opportunity document - Performance: Polars is 3-10x faster than Pandas for aggregations - Convert only aggregated results to Pandas (memory efficient) - Add pyarrow dependency for Polars to_pandas() conversion

cursor

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

__test__/data/api/runFindStreamTests.ts

__test__/data/api/runFindStreamBenchmark.ts

- Fix wrong variable name in runFindStreamTests.ts (failed++ -> testResults.failed++) - Change BENCHMARK_ITERATION_CONCURRENCY from 3 to 1 for accurate memory measurements - Fix TypeScript linting errors (any type, empty line, type guards)

silveirado · 2025-12-30T18:31:03Z

✅ Fixed issues reported by Cursor Bugbot:

Fixed wrong variable name in runFindStreamTests.ts (changed failed++ to testResults.failed++)
Changed BENCHMARK_ITERATION_CONCURRENCY from 3 to 1 for accurate memory measurements
Fixed TypeScript linting errors

All issues have been resolved in commit a763b95.

Dockerfile

- Replace echo with printf in polars pre-build step - BusyBox ash doesn't interpret \n in echo, causing malformed input - printf correctly interprets \n as newline character - This ensures polars is properly pre-compiled during Docker build - Prevents multi-minute delay on first pivot/graph request

cursor · 2025-12-30T18:43:49Z

__test__/data/api/runFindStreamBenchmark.ts

+		cpuSystem: endCpu.system / MILLISECONDS_PER_SECOND,
+		recordCount,
+		throughput,
+		peakMemory: peakMemory - startMemory.heapUsed,


Double subtraction causes incorrect peak memory in benchmark

The readStreamRecordsWithMetrics helper function already returns peakMemory as a delta (computed as memoryState.peakMemory - startMemory.heapUsed at line 107 of streamTestHelpers.ts). However, benchmarkFindStream subtracts startMemory.heapUsed again at line 131, resulting in peakMemory - 2 * startMemory.heapUsed. This causes incorrect (likely negative) peak memory values for the stream endpoint benchmark, while benchmarkFindPaginated correctly computes the delta from the raw peak value. The fix is to use peakMemory directly without the second subtraction.

Additional Locations (1)

__test__/data/api/streamTestHelpers.ts#L106-L107

- Add support for hierarchical column headers in pivotStream response - Update tests to validate presence and structure of columnHeaders - Modify API documentation to reflect new columnHeaders field - Implement logic in Python script to handle and return column headers - Ensure backward compatibility with existing pivot functionality Breaking changes: - Response format now includes columnHeaders, enhancing the pivot table structure.

cursor · 2026-01-09T21:56:04Z

docs/postman/Konecty-API.postman_collection.json

+					"description": "Find Opportunity records with complex filter. Example filtering by multiple status values."
+				},
+				"response": []
+			},


Malformed JSON structure in Postman collection item

Medium Severity

The "Find Stream - With Filter" item has incorrect indentation that breaks the JSON structure. Comparing with the correctly formatted "Find Stream - Contact" item (line 630), the description at line 658 and response at line 660 are indented one level less than required. This causes response to appear outside its parent item object, making the Postman collection invalid JSON that would fail to import.

cursor · 2026-01-09T21:56:04Z

__test__/data/api/runFindStreamConfidenceTest.ts

+
+			if (findStr !== streamStr) {
+				// Show first difference for debugging
+				return `${key}: find=${findStr.substring(0, MAX_SAMPLE_LENGTH)}... vs stream=${streamStr.substring(0, MAX_SAMPLE_LENGTH)}...`;


Calling substring on undefined causes TypeError

Medium Severity

In compareRecordFields, when a key exists in one record but not the other, accessing the missing key returns undefined. Calling JSON.stringify(undefined) returns the primitive undefined (not a string). Then calling .substring() on lines 186 throws a TypeError: Cannot read property 'substring' of undefined. This crashes the confidence test whenever records have different fields.

silveirado added 11 commits December 29, 2025 19:41

fix: remove unused imports from findStream.ts

23f45a5

- Remove KonectyResult (not used) - Remove errorReturn (not used) - Remove successReturn (not used) - Remove DataDocument (not used directly, only in streamTransforms) All imports are now used, lint passes without errors

docs(postman): update pivot endpoint example with hierarchical format

4355167

- Update Postman collection example response to show new hierarchical structure - Include metadata, nested children, subtotals, and grandTotals in example - Reflect breaking change in response format

silveirado temporarily deployed to develop December 30, 2025 18:20 — with GitHub Actions Inactive

cursor bot reviewed Dec 30, 2025

View reviewed changes

__test__/data/api/runFindStreamTests.ts Outdated Show resolved Hide resolved

__test__/data/api/runFindStreamBenchmark.ts Show resolved Hide resolved

silveirado requested review from 7sete7 and CaduGomes December 30, 2025 18:26

fix: resolve Cursor Bugbot issues

a763b95

- Fix wrong variable name in runFindStreamTests.ts (failed++ -> testResults.failed++) - Change BENCHMARK_ITERATION_CONCURRENCY from 3 to 1 for accurate memory measurements - Fix TypeScript linting errors (any type, empty line, type guards)

silveirado temporarily deployed to develop December 30, 2025 18:29 — with GitHub Actions Inactive

cursor bot reviewed Dec 30, 2025

View reviewed changes

Dockerfile Outdated Show resolved Hide resolved

silveirado changed the title ~~feat: Add graph endpoint with Polars and Pandas integration~~ feat: add HTTP streaming, pivot tables and graph endpoints with Python integration Dec 30, 2025

silveirado temporarily deployed to develop December 30, 2025 18:40 — with GitHub Actions Inactive

cursor bot reviewed Dec 30, 2025

View reviewed changes

silveirado deployed to develop January 9, 2026 21:48 — with GitHub Actions Active

cursor bot reviewed Jan 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add HTTP streaming, pivot tables and graph endpoints with Python integration #277

feat: add HTTP streaming, pivot tables and graph endpoints with Python integration #277

Uh oh!

silveirado commented Dec 30, 2025 •

edited

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

silveirado commented Dec 30, 2025

Uh oh!

Uh oh!

cursor bot Dec 30, 2025

Uh oh!

cursor bot Jan 9, 2026

Uh oh!

cursor bot Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add HTTP streaming, pivot tables and graph endpoints with Python integration #277

Are you sure you want to change the base?

feat: add HTTP streaming, pivot tables and graph endpoints with Python integration #277

Uh oh!

Conversation

silveirado commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Descrição

Principais Funcionalidades

1. HTTP Streaming Endpoint (findStream)

2. Pivot Tables Endpoint

3. Graph Endpoint

Commits Incluídos

Arquivos Criados

Streaming

Pivot

Graph

Testes

Documentação

Arquivos Modificados

Testes

Performance

Documentação

Dependências Python

Mudanças Recentes

Estrutura Hierárquica de Colunas (columnHeaders)

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

Uh oh!

Uh oh!

silveirado commented Dec 30, 2025

Uh oh!

Uh oh!

cursor bot Dec 30, 2025

Choose a reason for hiding this comment

Double subtraction causes incorrect peak memory in benchmark

Uh oh!

cursor bot Jan 9, 2026

Choose a reason for hiding this comment

Malformed JSON structure in Postman collection item

Uh oh!

cursor bot Jan 9, 2026

Choose a reason for hiding this comment

Calling substring on undefined causes TypeError

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

silveirado commented Dec 30, 2025 •

edited

Loading

Estrutura Hierárquica de Colunas (`columnHeaders`)