Skip to content

Conversation

@olgeni
Copy link
Contributor

@olgeni olgeni commented May 10, 2025

The current dependency on pyannote significantly increases the installation timeand complexity for speechmatics-python. Specifically, pyannote introduces heavy dependencies (numpy, scipy, and others), which, when built from source, require extensive build times and additional tooling, including a Fortran compiler \o/

├── speechmatics-python v3.0.4
│   ├── docopt v0.6.2
│   ├── httpx[http2] v0.28.1 (*)
│   ├── jiwer v3.1.0
│   │   ├── click v8.1.8
│   │   └── rapidfuzz v3.13.0
│   ├── more-itertools v10.7.0
│   ├── polling2 v0.5.0
│   ├── pyannote-core v5.0.0
│   │   ├── numpy v2.2.5
│   │   ├── scipy v1.15.3
│   │   │   └── numpy v2.2.5
│   │   ├── sortedcontainers v2.4.0
│   │   └── typing-extensions v4.13.2
│   ├── pyannote-database v5.1.3
│   │   ├── pandas v2.2.3
│   │   │   ├── numpy v2.2.5
│   │   │   ├── python-dateutil v2.9.0.post0
│   │   │   │   └── six v1.17.0
│   │   │   ├── pytz v2025.2
│   │   │   └── tzdata v2025.2
│   │   ├── pyannote-core v5.0.0 (*)
│   │   ├── pyyaml v6.0.2
│   │   └── typer v0.15.3
│   │       ├── click v8.1.8
│   │       ├── rich v14.0.0
│   │       │   ├── markdown-it-py v3.0.0
│   │       │   │   └── mdurl v0.1.2
│   │       │   └── pygments v2.19.1
│   │       ├── shellingham v1.5.4
│   │       └── typing-extensions v4.13.2
│   ├── regex v2024.11.6
│   ├── tabulate v0.9.0
│   ├── tenacity v8.2.3
│   ├── toml v0.10.2
│   └── websockets v14.2

Since the pyannote dependency is only used within the asr_metrics module and does not appear necessary for typical usage scenarios of the SDK, making it optional would greatly streamline default installations 😅

So, I tried to make it optional and edited the asr_metrics cli to handle the case when it is missing (and made some tiny fixes).

@olgeni olgeni force-pushed the optional-deps branch 3 times, most recently from 436601e to 12dc78b Compare May 15, 2025 10:39
@olgeni
Copy link
Contributor Author

olgeni commented May 15, 2025

Should be better - also I had missed a 'pandas' on the first try 😅

@dumitrugutu
Copy link
Contributor

Hi @olgeni, looks like your branch has some lint errors––try rebasing as these most likely come from the master branch.

BREAKING CHANGE: Metrics functionality now requires explicit installation

Previously, all metrics dependencies (pyannote, pandas, jiwer, etc.) were
installed by default. This change moves them to an optional '[metrics]' extra
to reduce the default installation footprint.

Changes:

  - Move metrics dependencies to requirements-metrics.txt
  - Configure extras_require in setup.py for optional installation
  - Add graceful error handling in CLI when dependencies are missing
  - Update README with installation instructions for metrics features

To use metrics functionality after this change:

  pip install speechmatics-python[metrics]

This significantly reduces installation size and time for users who only need
the core transcription features, while maintaining full functionality for those
who need metrics capabilities.
@olgeni
Copy link
Contributor Author

olgeni commented May 20, 2025

Some old, some new, should all be better now 💡

@dumitrugutu dumitrugutu merged commit bdb653a into speechmatics:master May 20, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants