Skip to content

[Snyk] Upgrade deepspeech from 0.4.1 to 0.9.3#5

Open
Shadow0ps wants to merge 1 commit intomasterfrom
snyk-upgrade-a00ed70ce161553cf3dc2c73228efc29
Open

[Snyk] Upgrade deepspeech from 0.4.1 to 0.9.3#5
Shadow0ps wants to merge 1 commit intomasterfrom
snyk-upgrade-a00ed70ce161553cf3dc2c73228efc29

Conversation

@Shadow0ps
Copy link

This PR was automatically created by Snyk using the credentials of a real user.


Snyk has created this PR to upgrade deepspeech from 0.4.1 to 0.9.3.

ℹ️ Keep your dependencies up-to-date. This makes it easier to fix existing vulnerabilities and to more quickly identify and fix newly disclosed vulnerabilities when they affect your project.


  • The recommended version is 70 versions ahead of your current version.
  • The recommended version was released 3 years ago, on 2020-12-10.
Release notes
Package name: deepspeech
  • 0.9.3 - 2020-12-10

    General

    This is the 0.9.3 release of Deep Speech, an open speech-to-text engine. In accord with semantic versioning, this version is not backwards compatible with earlier versions. However, models exported for 0.7.X and 0.8.X should work with this release. This is a bugfix release and retains compatibility with the 0.9.0, 0.9.1 and 0.9.2 models. All model files included here are identical to the ones in the 0.9.0 release. As with previous releases, this release includes the source code:

    v0.9.3.tar.gz

    Under the MPL-2.0 license. And the acoustic models:

    deepspeech-0.9.3-models.pbmm
    deepspeech-0.9.3-models.tflite

    In addition we're releasing experimental Mandarin Chinese acoustic models trained on an internal corpus composed of 2000h of read speech:

    deepspeech-0.9.3-models-zh-CN.pbmm
    deepspeech-0.9.3-models-zh-CN.tflite

    all under the MPL-2.0 license.

    The model files with the ".pbmm" extension are memory mapped and thus memory efficient and fast to load. The model files with the ".tflite" extension are converted to use TensorFlow Lite, has post-training quantization enabled, and are more suitable for resource constrained environments.

    The acoustic models were trained on American English with synthetic noise augmentation and the .pbmm model achieves an 7.06% word error rate on the LibriSpeech clean test corpus.

    Note that the model currently performs best in low-noise environments with clear recordings and has a bias towards US male accents. This does not mean the model cannot be used outside of these conditions, but that accuracy may be lower. Some users may need to train the model further to meet their intended use-case.

    In addition we release the scorer:

    deepspeech-0.9.3-models.scorer

    which takes the place of the language model and trie in older releases and which is also under the MPL-2.0 license.

    There is also a corresponding scorer for the Mandarin Chinese model:

    deepspeech-0.9.3-models-zh-CN.scorer

    We also include example audio files:

    audio-0.9.3.tar.gz

    which can be used to test the engine, and checkpoint files for both the English and Mandarin models:

    deepspeech-0.9.3-checkpoint.tar.gz
    deepspeech-0.9.3-checkpoint-zh-CN.tar.gz

    which are under the MPL-2.0 license and can be used as the basis for further fine-tuning.

    Notable changes from the previous release

    • Add CI testing for hot word boosting on .NET bindings (#3416)
    • Improve error message on generate_scorer_package tooling (#3435)
    • Enable support for building static iOS framework (#3436)
    • Change Java binding package name from org.mozilla.deepspeech to org.deepspeech (#3454)
    • Expose Stream type on TypeScript binding (#3456)

    Training Regimen + Hyperparameters for fine-tuning

    The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 Quadro RTX 6000 GPUs each with 24GB of VRAM), and our use of cuDNN RNN.

    In contrast to some previous releases, training for this release occurred as a fine tuning of the previous 0.8.2 checkpoint, with data augmentation options enabled. The following hyperparameters were used for the fine tuning. See the 0.8.2 release notes for the hyperparameters used for the base model.

    • train_files Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.
    • dev_files LibriSpeech clean dev corpus.
    • test_files LibriSpeech clean test corpus
    • train_batch_size 128
    • dev_batch_size 128
    • test_batch_size 128
    • n_hidden 2048
    • learning_rate 0.0001
    • dropout_rate 0.40
    • epochs 200
    • augment pitch[pitch=1~0.1]
    • augment tempo[factor=1~0.1]
    • augment overlay[p=0.9,source=${noise},layers=1,snr=12~4] (where ${noise} is a dataset of Freesound.org background noise recordings)
    • augment overlay[p=0.1,source=${voices},layers=10~2,snr=12~4] (where ${voices} is a dataset of audiobook snippets extracted from Librivox)
    • augment resample[p=0.2,rate=12000~4000]
    • augment codec[p=0.2,bitrate=32000~16000]
    • augment reverb[p=0.2,decay=0.7~0.15,delay=10~8]
    • augment volume[p=0.2,dbfs=-10~10]
    • cache_for_epochs 10

    The weights with the best validation loss were selected at the end of 200 epochs using --noearly_stop.

    The optimal lm_alpha and lm_beta values with respect to the LibriSpeech clean dev corpus remain unchanged from the previous release:

    • lm_alpha 0.931289039105002
    • lm_beta 1.1834137581510284

    For the Mandarin Chinese model, the following values are recommended:

    • lm_alpha 0.6940122363709647
    • lm_beta 4.777924224113021

    Bindings

    This release also includes a Python based command line tool deepspeech, installed through

    pip install deepspeech
    

    Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

    pip install deepspeech-gpu

    On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

    pip install deepspeech-tflite

    Also, it exposes bindings for the following languages

    • Python (Versions 3.5, 3.6, 3.7, 3.8 and 3.9) installed via

      pip install deepspeech

      Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

      pip install deepspeech-gpu

      On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

      pip install deepspeech-tflite
    • NodeJS (Versions 10.x, 11.x, 12.x, 13.x, 14.x and 15.x) installed via

      npm install deepspeech
      

      Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

      npm install deepspeech-gpu
      

      On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

      npm install deepspeech-tflite
    • ElectronJS versions 5.0, 6.0, 6.1, 7.0, 7.1, 8.0, 9.0, 9.1, 9.2, 10.0, 10.1, and 11.0 are also supported

    • C which requires the appropriate shared objects are installed from native_client.tar.xz (See the section in the main README which describes native_client.tar.xz installation.)

    • .NET which is installed by following the instructions on the NuGet package page.

    In addition there are third party bindings that are supported by external developers, for example

    • Rust which is installed by following the instructions on the external Rust repo.
    • Go which is installed by following the instructions on the external Go repo.
    • V which is installed by following the instructions on the external Vlang repo.

    Supported Platforms

    • Windows 8.1, 10, and Server 2012 R2 64-bits (at least AVX support, requires Redistribuable Visual C++ 2015 Update 3 (64-bits) for runtime).

    • OS X 10.10, 10.11, 10.12, 10.13, 10.14, and 10.15

    • Linux x86 64 bit with a modern CPU (at least AVX/FMA)

    • Linux x86 64 bit with a modern CPU (at least AVX/FMA) + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)

    • Raspbian Buster on Raspberry Pi 3, Pi 4

    • Linux/ARM64 built against Debian/ARMbian Buster and tested on LePotato boards

    • Java Android (7.0-11.0) bindings (+ demo app). Tested on Google Pixel 2 ; Sony Xperia Z Premium ; Nokia 1.3, TF Lite model only.

    • iOS with Swift bindings (experimental). Tested on iPhone Xs.

    • TFLite Delegation API is here as a preview: do not expect released models to work out-of-the box, but feedback / PRs is welcome.

    Documentation

    Documentation is available on deepspeech.readthedocs.io.

    Contact/Getting Help

    1. FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.
    2. Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.
    3. Matrix - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the #machinelearning:mozilla.org channel on Mozilla Matrix; people there can try to answer/help
    4. Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.

    Contributors to 0.9.3 release

    • Alexandre Lissy
    • Catalin Voss
    • Olaf Thiele
    • Reuben Morais
  • 0.9.2 - 2020-12-03
    Read more
  • 0.9.1 - 2020-11-05
    Read more
  • 0.9.0 - 2020-11-02
    Read more
  • 0.9.0-alpha.12 - 2020-10-30
    No content.
  • 0.9.0-alpha.11 - 2020-10-09
    No content.
  • 0.9.0-alpha.10 - 2020-09-25

    Merge pull request #3337 from lissyx/bump-0.9.0a10

    Bump VERSION to 0.9.0-alpha.10

  • 0.9.0-alpha.9 - 2020-09-21
  • 0.9.0-alpha.8 - 2020-09-10
  • 0.9.0-alpha.3 - 2020-07-15
  • 0.9.0-alpha.2 - 2020-07-07
  • 0.9.0-alpha.1 - 2020-07-06
  • 0.9.0-alpha.0 - 2020-06-24
  • 0.8.2 - 2020-08-22
  • 0.8.1 - 2020-08-11
  • 0.8.0 - 2020-07-30
  • 0.8.0-alpha.8 - 2020-07-27
  • 0.8.0-alpha.7 - 2020-07-15
  • 0.8.0-alpha.6 - 2020-07-04
  • 0.8.0-alpha.5 - 2020-07-03
  • 0.8.0-alpha.4 - 2020-06-23
  • 0.8.0-alpha.3 - 2020-06-09
  • 0.8.0-alpha.2 - 2020-05-27
  • 0.8.0-alpha.1 - 2020-05-26
  • 0.8.0-alpha.0 - 2020-05-26
  • 0.7.4 - 2020-06-18
  • 0.7.3 - 2020-06-04
  • 0.7.1 - 2020-05-12
  • 0.7.1-alpha.2 - 2020-05-07
  • 0.7.1-alpha.1 - 2020-05-04
  • 0.7.1-alpha.0 - 2020-05-01
  • 0.7.0 - 2020-04-24
  • 0.7.0-alpha.4 - 2020-04-24
  • 0.7.0-alpha.3 - 2020-03-25
  • 0.7.0-alpha.2 - 2020-02-17
  • 0.7.0-alpha.1 - 2020-02-03
  • 0.7.0-alpha.0 - 2020-01-31
  • 0.6.1 - 2020-01-10
  • 0.6.1-alpha.0 - 2019-12-13
  • 0.6.0 - 2019-12-03
  • 0.6.0-alpha.15 - 2019-11-14
  • 0.6.0-alpha.14 - 2019-11-07
  • 0.6.0-alpha.13 - 2019-11-05
  • 0.6.0-alpha.12 - 2019-11-05
  • 0.6.0-alpha.11 - 2019-10-26
  • 0.6.0-alpha.10 - 2019-10-17
  • 0.6.0-alpha.9 - 2019-10-11
  • 0.6.0-alpha.8 - 2019-09-27
  • 0.6.0-alpha.7 - 2019-09-24
  • 0.6.0-alpha.6 - 2019-09-19
  • 0.6.0-alpha.5 - 2019-08-22
  • 0.6.0-alpha.4 - 2019-07-12
  • 0.6.0-alpha.3 - 2019-07-11
  • 0.6.0-alpha.2 - 2019-07-05
  • 0.6.0-alpha.1 - 2019-06-25
  • 0.6.0-alpha.0 - 2019-06-20
  • 0.5.1 - 2019-06-20
  • 0.5.0 - 2019-06-11
  • 0.5.0-alpha.11 - 2019-05-31
  • 0.5.0-alpha.10 - 2019-05-22
  • 0.5.0-alpha.9 - 2019-05-22
  • 0.5.0-alpha.8 - 2019-05-10
  • 0.5.0-alpha.7 - 2019-04-26
  • 0.5.0-alpha.6 - 2019-04-25
  • 0.5.0-alpha.5 - 2019-04-08
  • 0.5.0-alpha.4 - 2019-03-20
  • 0.5.0-alpha.3 - 2019-03-20
  • 0.5.0-alpha.2 - 2019-03-13
  • 0.5.0-alpha.1 - 2019-01-24
  • 0.5.0-alpha.0 - 2019-01-23
  • 0.4.1 - 2019-01-10
from deepspeech GitHub release notes

Note: You are seeing this because you or someone else with access to this repository has authorized Snyk to open upgrade PRs.

For more information:

🧐 View latest project report

🛠 Adjust upgrade PR settings

🔕 Ignore this dependency or unsubscribe from future upgrade PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants