Skip to content

Segger fails for data generated with Xenium Ranger v1: expected String type, got: binary #20

@erikla

Description

@erikla

A call to segger segment ends with the following error when run with sbatch on iris.

Call:

#!/bin/bash
#SBATCH --job-name="segger-s1"
#SBATCH --partition=A
#SBATCH --gpus=1
#SBATCH --nodes=1
#SBATCH --mem=100G
#SBATCH --time=20:00:00
micromamba activate segger
segger segment -i /projects/path_to_10x/outs/ -o /projects/path_to_10x/segger_out/

**Traceback (most recent call last):
File "/home/erik/micromamba/envs/segger/bin/segger", line 6, in
sys.exit(app())
^^^^^
File "/home/erik/micromamba/envs/segger/lib/python3.11/site-packages/cyclopts/core.py", line 1869, in call
result = _run_maybe_async_command(command, bound, resolved_backend)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/erik/micromamba/envs/segger/lib/python3.11/site-packages/cyclopts/_run.py", line 50, in _run_maybe_async_command
return command(*bound.args, **bound.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/erik/bin/github/segger/src/segger/cli/segment.py", line 308, in segment
datamodule = ISTDataModule(
^^^^^^^^^^^^^^
File "", line 27, in init
File "/home/erik/bin/github/segger/src/segger/data/data_module.py", line 160, in post_init
self.load()
File "/home/erik/bin/github/segger/src/segger/data/data_module.py", line 172, in load
tx = self.tx = pp.transcripts
^^^^^^^^^^^^^^
File "/home/erik/micromamba/envs/segger/lib/python3.11/functools.py", line 1001, in get
val = self.func(instance)
^^^^^^^^^^^^^^^^^^^
File "/home/erik/bin/github/segger/src/segger/io/preprocessor.py", line 437, in transcripts
.collect()
^^^^^^^^^
File "/home/erik/micromamba/envs/segger/lib/python3.11/site-packages/polars/_utils/deprecation.py", line 97, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/erik/micromamba/envs/segger/lib/python3.11/site-packages/polars/lazyframe/opt_flags.py", line 326, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/erik/micromamba/envs/segger/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 2440, in collect
return wrap_df(ldf.collect(engine, callback))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.InvalidOperationError: expected String type, got: binary

Resolved plan until failure:
---> FAILED HERE RESOLVING 'sink' <---
FILTER col("feature_name").str.contains(["NegControlProbe_|antisense_|NegControlCodeword*|BLANK_|DeprecatedCodeword_|UnassignedCodeword_*"]).not()
FROM
FILTER [(col("qv")) >= (20.0)]
FROM
Parquet SCAN [/projects/path_to_10x/outs/transcripts.parquet]
PROJECT /8 COLUMNS
ROW_INDEX: row_index
ESTIMATED ROWS: 1000000
*

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions