Skip to content

Feature Request: Expose ingestion tuning parameters for faster backfills #750

@Evan-Kim2028

Description

@Evan-Kim2028

Summary

The DeepBook indexer currently uses hardcoded defaults for the sui-indexer-alt-framework ingestion configuration. Exposing these parameters via CLI arguments would allow users to tune backfill performance based on their hardware and network capabilities.

Problem

When running backfills (e.g., syncing 1 week or 1 month of historical data), the indexer uses the framework's default ingestion settings:

  • checkpoint_buffer_size: 5,000
  • ingest_concurrency: 200
  • retry_interval_ms: 200

These defaults are conservative. Users with capable hardware and good network connectivity cannot tune these values without modifying the source code.

Proposed Solution

Expose three additional CLI arguments in main.rs:

/// Buffer size for checkpoints between ingestion and processing.
/// Higher values use more memory but provide smoother throughput.
#[clap(env, long, default_value = "5000")]
checkpoint_buffer_size: usize,

/// Number of concurrent checkpoint fetches from the remote store.
/// Higher values improve ingestion speed but increase network load.
#[clap(env, long, default_value = "200")]
ingest_concurrency: usize,

/// Retry interval for missing checkpoints in milliseconds.
/// Lower values reduce latency but may cause unnecessary retries.
#[clap(env, long, default_value = "200")]
retry_interval_ms: u64,

Then pass these to a custom IngestionConfig instead of Default::default():

let ingestion_config = IngestionConfig {
    checkpoint_buffer_size,
    ingest_concurrency,
    retry_interval_ms,
    ..Default::default()
};

let mut indexer = Indexer::new(
    store,
    indexer_args,
    client_args,
    ingestion_config,  // Instead of Default::default()
    // ...
)

Sample Performance Results

Testing with more aggressive settings on a local machine showed meaningful speedup:

Setting Default Aggressive
checkpoint_buffer_size 5,000 15,000
ingest_concurrency 200 800
retry_interval_ms 200 100
Observed Speed ~450 cps ~670 cps
Speedup 1.0x ~1.5x

Note: These are sample results from a single test. Actual performance will vary based on hardware, network conditions, and the checkpoint store's capacity. The optimal values and maximum achievable speedup are not yet determined.

Benefits

  1. Faster backfills - Users can tune for their specific hardware/network
  2. No breaking changes - Defaults remain the same as current behavior
  3. Minimal code change - ~20 lines of additional code
  4. Aligns with framework capabilities - These parameters are already supported by sui-indexer-alt-framework, just not exposed

Additional Context

Happy to submit a PR if this approach looks reasonable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions