Skip to content

Windows absolute paths with drive letters incorrectly parsed as cloud provider schemes #960

@tuhlnaa

Description

@tuhlnaa

Environment

  • OS: Windows 10
  • Hardware (GPU, or instance type): Local machine
    streaming library version: 0.13.0

To reproduce

Steps to reproduce the behavior:

  1. On a Windows system, attempt to create an MDSWriter with an absolute Windows path:
from streaming import MDSWriter

columns = {
    'filename': 'str',
    'data': 'ndarray:float32'
}

# This fails on Windows with absolute paths
output_path = r"D:/test"

with MDSWriter(out=output_path, columns=columns) as writer:
    pass
  1. Observe the error:
ValueError: Invalid Cloud provider prefix: d.

Expected behavior

MDSWriter should accept Windows absolute paths (e.g., D:\path\to\dir or D:/path/to/dir) as valid local filesystem paths, similar to how it handles Unix absolute paths (e.g., /path/to/dir).

Root Cause

The issue is in streaming/base/storage/upload.py in the CloudUploader.get() method:

obj = urllib.parse.urlparse(out) if isinstance(out, str) else urllib.parse.urlparse(out[1])
provider_prefix = obj.scheme

When urllib.parse.urlparse() is called on a Windows path like D:/path/to/dir, it incorrectly interprets the drive letter as a URL scheme:

>>> import urllib.parse
>>> urllib.parse.urlparse("D:/test")
ParseResult(scheme='d', netloc='', path='/test', params='', query='', fragment='')

The drive letter D: is parsed as scheme='d', which then fails the validation since 'd' is not in the UPLOADERS dictionary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions