Skip to content

perf: query timeout on large repos (80K+ files) #8

@avifenesh

Description

@avifenesh

Problem

repo-intel query onboard and repo-intel query can-i-help time out (>30s) on very large repos like TypeScript (81K files) and Deno (28K files). The areas() function iterates all file_activity entries and groups by directory, which becomes expensive at scale.

Repos affected

  • microsoft/TypeScript (~81K files)
  • denoland/deno (~28K files)

Potential fixes

  1. Cache areas() result within a query session (it's called by both onboard and can-i-help)
  2. Pre-compute directory groupings during merge_delta() instead of on-the-fly
  3. Add a file count threshold - for repos > 10K files, sample or limit to recent files only

Context

Discovered during 100-repo validation. 97/100 repos pass; these 2 timeout.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions