Environment
- OS: Windows 10
- Hardware (GPU, or instance type): Local machine
streaming library version: 0.13.0
To reproduce
Steps to reproduce the behavior:
- On a Windows system, attempt to create an MDSWriter with an absolute Windows path:
from streaming import MDSWriter
columns = {
'filename': 'str',
'data': 'ndarray:float32'
}
# This fails on Windows with absolute paths
output_path = r"D:/test"
with MDSWriter(out=output_path, columns=columns) as writer:
pass
- Observe the error:
ValueError: Invalid Cloud provider prefix: d.
Expected behavior
MDSWriter should accept Windows absolute paths (e.g., D:\path\to\dir or D:/path/to/dir) as valid local filesystem paths, similar to how it handles Unix absolute paths (e.g., /path/to/dir).
Root Cause
The issue is in streaming/base/storage/upload.py in the CloudUploader.get() method:
obj = urllib.parse.urlparse(out) if isinstance(out, str) else urllib.parse.urlparse(out[1])
provider_prefix = obj.scheme
When urllib.parse.urlparse() is called on a Windows path like D:/path/to/dir, it incorrectly interprets the drive letter as a URL scheme:
>>> import urllib.parse
>>> urllib.parse.urlparse("D:/test")
ParseResult(scheme='d', netloc='', path='/test', params='', query='', fragment='')
The drive letter D: is parsed as scheme='d', which then fails the validation since 'd' is not in the UPLOADERS dictionary.