Skip to content

[Feature] Batch API Endpoint #11

@Siddhant-K-code

Description

@Siddhant-K-code

Summary

Add batch processing endpoint for large-scale deduplication workloads.

Motivation

Some use cases require processing thousands of chunks. A batch endpoint with async processing and webhooks enables efficient large-scale operations.

API Design

Submit Batch Job

POST /v1/batch
{
  "chunks": [...],  // or "chunks_url": "s3://..."
  "options": {...},
  "webhook_url": "https://example.com/callback"
}

Response:
{
  "job_id": "batch_abc123",
  "status": "queued",
  "estimated_duration_seconds": 120
}

Check Status

GET /v1/batch/batch_abc123

Response:
{
  "job_id": "batch_abc123",
  "status": "processing",  // queued, processing, completed, failed
  "progress": 0.45,
  "chunks_processed": 450,
  "chunks_total": 1000
}

Get Results

GET /v1/batch/batch_abc123/results

Response:
{
  "job_id": "batch_abc123",
  "status": "completed",
  "results": [...],
  "stats": {...}
}

Components

  • Job queue (in-memory or Redis)
  • Background worker
  • Webhook notifications
  • S3/GCS input support
  • Result storage and retrieval

Acceptance Criteria

  • Process 10K+ chunks in single batch
  • Progress tracking via polling or webhook
  • Results available for 24 hours
  • Graceful handling of failures

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions