Web audio --WebSocket--> FastAPI Server.
Use https to use getUserMedia cross host.
uvicorn src.main:app --host=0.0.0.0 --reload --ssl-keyfile=./key.pem --ssl-certfile=./cert.pem
deprecated.
uvicorn src.main:app --reload
The API is based on the manipulation of a MediaStream object representing a flux of audio- or video-related data. See an example in Get the video.
A MediaStream consists of zero or more MediaStreamTrack objects, representing various audio or video tracks. Each MediaStreamTrack may have one or more channels. The channel represents the smallest unit of a media stream, such as an audio signal associated with a given speaker, like left or right in a stereo audio track.
MediaStream objects have a single input and a single output. A MediaStream object generated by getUserMedia() is called local, and has as its source input one of the user's cameras or microphones. A non-local MediaStream may be representing to a media element, like ](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/video) or [, a stream originating over the network, and obtained via the WebRTC RTCPeerConnection API, or a stream created using the Web Audio API MediaStreamAudioSourceNode.
The output of the MediaStream object is linked to a consumer. It can be a media elements, like ](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/audio) or [, the WebRTC RTCPeerConnection API or a Web Audio API MediaStreamAudioSourceNode.
https://developer.mozilla.org/en-US/docs/Web/API/Media_Streams_API
navigator.mediaDevices.getUserMedia:for read microphone stream.context.createScriptProcessor: for process audio buffer, though it is deprecated.
const handleSuccess = function (stream) {
const context = new AudioContext();
const source = context.createMediaStreamSource(stream);
const processor = context.createScriptProcessor(1024, 1, 1);
source.connect(processor);
processor.connect(context.destination);
processor.onaudioprocess = function (e) {
// Do something with the data, e.g. convert it to WAV
console.log(e.inputBuffer);
};
};
navigator.mediaDevices.getUserMedia({ audio: true, video: false })
.then(handleSuccess);
Cause getUserMedia with Constraint Not work, so resample by the following methods:
// `sourceAudioBuffer` is an AudioBuffer instance of the source audio
// at the original sample rate.
const DESIRED_SAMPLE_RATE = 16000;
const offlineCtx = new OfflineAudioContext(sourceAudioBuffer.numberOfChannels, sourceAudioBuffer.duration * DESIRED_SAMPLE_RATE, DESIRED_SAMPLE_RATE);
const cloneBuffer = offlineCtx.createBuffer(sourceAudioBuffer.numberOfChannels, sourceAudioBuffer.length, sourceAudioBuffer.sampleRate);
// Copy the source data into the offline AudioBuffer
for (let channel = 0; channel < sourceAudioBuffer.numberOfChannels; channel++) {
cloneBuffer.copyToChannel(sourceAudioBuffer.getChannelData(channel), channel);
}
// Play it from the beginning.
const source = offlineCtx.createBufferSource();
source.buffer = cloneBuffer;
source.connect(offlineCtx.destination);
offlineCtx.oncomplete = function (e) {
// `resampledAudioBuffer` contains an AudioBuffer resampled at 16000Hz.
// use resampled.getChannelData(x) to get an Float32Array for channel x.
const resampledAudioBuffer = e.renderedBuffer;
console.log(resampledAudioBuffer);
}
offlineCtx.startRendering();
source.start(0);
https://stackoverflow.com/a/55427982/974526
navigator.mediaDevices.getUserMedia({audio: true})
.then((stream) => {
let context = new AudioContext(),
bufSize = 4096,
microphone = context.createMediaStreamSource(stream),
processor = context.createScriptProcessor(bufSize, 1, 1),
res = new Resampler(context.sampleRate, 16000, 1, bufSize),
bufferArray = [];
processor.onaudioprocess = (event) => {
console.log('onaudioprocess');
// const right = event.inputBuffer.getChannelData(1);
const outBuf = res.resample(event.inputBuffer.getChannelData(0));
bufferArray.push.apply(bufferArray, outBuf);
}
}
}
https://github.com/felix307253927/resampler
Although navigator.mediaDevices.getUserMedia is set by following MediaTrackConstraints: mediaStreamConstraints, the stream is still at SampleRate 48000. Because the Chrome browser I use only support sampleRate 48000.
const mediaStreamConstraints = {
audio: {
channelCount: 1,
sampleRate: 16000,
sampleSize: 16
}
}
// set constraints at begining
navigator.mediaDevices.getUserMedia(mediaStreamConstraints)
.catch( err => serverlog(`ERROR mediaDevices.getUserMedia: ${err}`) )
.then( stream => {
const track = mediaStream.getAudioTracks()[0];
// can update audio track Constraints here
// track.applyConstraints(mediaStreamConstraints['audio'])
.then(() => {
console.log(track.getCapabilities());
});
// audio recorded as Blob
// and the binary data are sent via socketio to a nodejs server
// that store blob as a file (e.g. audio/inp/audiofile.webm)
} )
So how to check the capabilities?
let stream = await navigator.mediaDevices.getUserMedia({audio: true});
let track = stream.getAudioTracks()[0];
console.log(track.getCapabilities());
output:
{autoGainControl: Array(2), channelCount: {…}, deviceId: "default", echoCancellation: Array(2), groupId: "1e76386ad54f9ad3548f6f6c14c08e7eff6753f9362d93d8620cc48f546604f5", …}
autoGainControl: (2) [true, false]
channelCount: {max: 2, min: 1}
deviceId: "default"
echoCancellation: (2) [true, false]
groupId: "1e76386ad54f9ad3548f6f6c14c08e7eff6753f9362d93d8620cc48f546604f5"
latency: {max: 0.01, min: 0.01}
noiseSuppression: (2) [true, false]
sampleRate: {max: 48000, min: 48000}
sampleSize: {max: 16, min: 16}
__proto__: Object
https://developer.mozilla.org/en-US/docs/Web/API/Media_Streams_API/Constraints
The legacy ScriptProcessorNode was asynchronous and required thread hops (between UI thread and user thread), which could produce an unstable audio output. The AudioWorklet object provides a new synchronous JavaScript execution context which allows developers to programmatically control audio without additional latency and higher stability in the output audio.
You can see example code in action along with other examples at Google Chrome Labs.
https://blog.chromium.org/2018/03/chrome-66-beta-css-typed-object-model.html
Safari does not support AudioWorklet now.
https://developer.mozilla.org/en-US/docs/Web/API/AudioWorklet
The Web Audio API provides a powerful and versatile system for controlling audio on the Web, allowing developers to choose audio sources, add effects to audio, create audio visualizations, apply spatial effects (such as panning) and much more.
Browser/Web audio Brief history:
flash play audio -> <audio> element -> Web Audio API (do something outside main thread)
https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API
- Use Web Audio API native
- Use recorder.js, but is not being actively maintained. (Can not get streaming buffer, only after stop.)
- Use RecordRTC.js, it is active and support almost browser. ((Can not get streaming buffer, only after stop.)
Audio glitches are caused by an interruption of the normal continuous audio stream, resulting in loud clicks and pops. It is considered to be a catastrophic failure of a multi-media system and MUST be avoided. It can be caused by problems with the threads responsible for delivering the audio stream to the hardware, such as scheduling latencies caused by threads not having the proper priority and time-constraints. It can also be caused by the audio DSP trying to do more work than is possible in real-time given the CPU’s speed.
The ScriptProcessorNode is constructed with a bufferSize which MUST be one of the following values: 256, 512, 1024, 2048, 4096, 8192, 16384. This value controls how frequently the onaudioprocess event is dispatched and how many sample-frames need to be processed each call. onaudioprocess events are only dispatched if the ScriptProcessorNode has at least one input or one output connected. Lower numbers for bufferSize will result in a lower (better) latency. Higher numbers will be necessary to avoid audio breakup and glitches.
Use mkcert to make certificates.
mkcert: A simple zero-config tool to make locally trusted development certificates with any names you'd like.
mkcert -key-file key.pem -cert-file cert.pem localhost <host ip>
There are several ways to downsample audio in web:
- OfflineAudioContext (native code, built in downsampling feature), currently used.
- Web Worker, and use self implementation downsampling method, such JavaScript or WebAssembly code.
-
WebSocket WSS (Self Signed Certificate) doesn't work on iOS Safari
-
Sometimes Chrome at Ubuntu 18.04 may became discontinuous (https://stackoverflow.com/questions/54794052/how-can-i-prevent-breakup-choppiness-glitches-when-using-an-audioworklet-to-stre)