Skip to content

htslib in pysam.fetch on S3 Bucket #1670

@StephanHolgerD

Description

@StephanHolgerD

Hi, I want to report a potentially problematic behaviour using pysam.fetch on AWS S3 bucket infrastructure. Using the following pseudo code on a Bam file in a S3 Bucket will create requests without a defined end range.

Code

with pysam.AlignmentFile(bamfile_S3,filepath_index=baifile_S3) as f:
      for r in f.fetch(chrom,start,end):

Request

image

This kind of 'open' request results in high egress costs because aws logs the whole file after the start byte as delivered, even if you stop reading the data at the end of your fetch coordinates.

Compared to the requests from IGV on S3 data (low egress costs, only the exact byte range is logged)

Request

image

Initially I reported this here:

pysam-developers/pysam#1215

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions