Title: | Voice Activity Detection using the 'webrtc' Toolkit |
---|---|
Description: | Voice Activity Detection using the 'webrtc' toolkit. Identify the locations in audio files where there is an active voice. The is done based on a Gaussian Mixture Model implemented in the 'webrtc' framework. |
Authors: | Jan Wijffels [aut, cre, cph] (R wrapper), BNOSAC [cph] (R wrapper), The WebRTC project authors [cph] (Code in src/webrtc), David Reid [cph] (Code in src/dr_libs) |
Maintainer: | Jan Wijffels <[email protected]> |
License: | MPL-2.0 |
Version: | 0.2 |
Built: | 2024-11-03 05:57:19 UTC |
Source: | https://github.com/bnosac/audio.vadwebrtc |
Postprocessing the Voice Activity Detection whereby sequences of voiced/non-voiced segments are collapsed by
first considering all non-voiced segments which are small in duration (default < 1 second) voiced
next considering voiced segments with length less than a number of seconds (default < 1 second) non-voiced
is.voiced(x, channel = 0, units = "seconds", ...)
is.voiced(x, channel = 0, units = "seconds", ...)
x |
an object of class VAD as returned by |
channel |
integer with the channel, showing the voiced section of that channel only. Only used for segments extracted with |
units |
character string with the units to use for the output and thresholds used in the function - either 'seconds' or 'milliseconds' |
... |
further arguments passed on to the function |
A data.frame with columns vad_segment, start, end, duration, has_voice indicating where in the audio voice is detected
file <- system.file(package = "audio.vadwebrtc", "extdata", "test_wav.wav") vad <- VAD(file, mode = "normal", milliseconds = 30) vad$vad_segments voiced <- is.voiced(vad, silence_min = 0.2, voiced_min = 1) voiced voiced <- is.voiced(vad, silence_min = 200, units = "milliseconds") voiced
file <- system.file(package = "audio.vadwebrtc", "extdata", "test_wav.wav") vad <- VAD(file, mode = "normal", milliseconds = 30) vad$vad_segments voiced <- is.voiced(vad, silence_min = 0.2, voiced_min = 1) voiced voiced <- is.voiced(vad, silence_min = 200, units = "milliseconds") voiced
Detect the location of active voice in audio. The Voice Activity Detection is implemented using a Gaussian Mixture Model from the "webrtc" framework. It works with .wav audio files with a sample rate of 8, 16 or 32 Khz an can be applied over a window of eiher 10, 20 or 30 milliseconds.
VAD( file, mode = c("normal", "lowbitrate", "aggressive", "veryaggressive"), milliseconds = 10L, type = "webrtc" )
VAD( file, mode = c("normal", "lowbitrate", "aggressive", "veryaggressive"), milliseconds = 10L, type = "webrtc" )
file |
the path to an audio file which should be a file in 16 bit with mono PCM samples (pcm_s16le codec) with a sampling rate of either 8Khz, 16KHz or 32Khz |
mode |
character string with the type of voice detection, either 'normal', 'lowbitrate', 'aggressive' or 'veryaggressive' where 'veryaggressive' means more silences are detected |
milliseconds |
integer with the number of milliseconds indicating to compute by this number of milliseconds the VAD signal. Can only be 10, 20 or 30. Defaults to 10. |
type |
character string with the type of VAD model. Only 'webrtc' currently. |
an object of class VAD
which is a list with elements
file: the path to the file
sample_rate: the sample rate of the audio file in Hz
channels: the number of channels in the audio - as the algorithm requires the audio to be mono this should only be 1
samples: the number of samples in the data
bitsPerSample: the number of bits per sample
bytesPerSample: the number of bytes per sample
type: the type of VAD model - currently only 'webrtc-gmm'
mode: the provided VAD mode
milliseconds: the provided milliseconds - either by 10, 20 or 30 ms frames
frame_length: the frame length corresponding to the provided milliseconds
vad: a data.frame with columns millisecond, has_voice and vad_segment indicating if the audio contains an active voice signal at that millisecond
vad_segments: a data.frame with columns vad_segment, start, end and has_voice where the start/end values are in seconds
vad_stats: a list with elements n_segments, n_segments_has_voice, n_segments_has_no_voice, seconds_has_voice, seconds_has_no_voice, pct_has_voice indicating the number of segments with voice and the duration of the voice/non-voice in the audio
file <- system.file(package = "audio.vadwebrtc", "extdata", "test_wav.wav") vad <- VAD(file, mode = "normal", milliseconds = 30) vad vad <- VAD(file, mode = "lowbitrate", milliseconds = 20) vad vad <- VAD(file, mode = "aggressive", milliseconds = 20) vad vad <- VAD(file, mode = "veryaggressive", milliseconds = 20) vad vad <- VAD(file, mode = "normal", milliseconds = 10) vad vad$vad_segments ## Not run: library(av) x <- read_audio_bin(file) plot(seq_along(x) / 16000, x, type = "l") abline(v = vad$vad_segments$start, col = "red", lwd = 2) abline(v = vad$vad_segments$end, col = "blue", lwd = 2) ## ## If you have audio which is not in mono or another sample rate ## consider using R package av to convert to the desired format av_media_info(file) av_audio_convert(file, output = "audio_pcm_16khz.wav", format = "wav", channels = 1, sample_rate = 16000) vad <- VAD("audio_pcm_16khz.wav", mode = "normal") ## End(Not run) file <- system.file(package = "audio.vadwebrtc", "extdata", "leak-test.wav") vad <- VAD(file, mode = "normal") vad vad$vad_segments vad$vad_stats
file <- system.file(package = "audio.vadwebrtc", "extdata", "test_wav.wav") vad <- VAD(file, mode = "normal", milliseconds = 30) vad vad <- VAD(file, mode = "lowbitrate", milliseconds = 20) vad vad <- VAD(file, mode = "aggressive", milliseconds = 20) vad vad <- VAD(file, mode = "veryaggressive", milliseconds = 20) vad vad <- VAD(file, mode = "normal", milliseconds = 10) vad vad$vad_segments ## Not run: library(av) x <- read_audio_bin(file) plot(seq_along(x) / 16000, x, type = "l") abline(v = vad$vad_segments$start, col = "red", lwd = 2) abline(v = vad$vad_segments$end, col = "blue", lwd = 2) ## ## If you have audio which is not in mono or another sample rate ## consider using R package av to convert to the desired format av_media_info(file) av_audio_convert(file, output = "audio_pcm_16khz.wav", format = "wav", channels = 1, sample_rate = 16000) vad <- VAD("audio_pcm_16khz.wav", mode = "normal") ## End(Not run) file <- system.file(package = "audio.vadwebrtc", "extdata", "leak-test.wav") vad <- VAD(file, mode = "normal") vad vad$vad_segments vad$vad_stats
Voice Activity Detection per channel.
Transforms the audio file to a wav file with the provided sample_rate
and perform the voice activity detection per channel.
VAD_channel(file, sample_rate = 16000, channels = c("default", "all"), ...)
VAD_channel(file, sample_rate = 16000, channels = c("default", "all"), ...)
file |
the path to an audio file |
sample_rate |
integer with the |
channels |
character string - either 'default' or 'all' indicating to do the voice activity detection for each channel independently ('default') or for all channels independently as well as all channels together ('all') |
... |
further arguments passed on to |
an object of class webrtc-gmm-bychannel
which is a list with elements
file: the path to the file
duration_secs: seconds
sample_rate: the sample rate of the audio file in Hz
channels: the number of channels in the audio
samples: the number of samples in the data
bitsPerSample: the number of bits per sample
bytesPerSample: the number of bytes per sample
type: the type of VAD model - currently only 'webrtc-gmm'
mode: the provided VAD mode
milliseconds: the provided milliseconds - either by 10, 20 or 30 ms frames
frame_length: the frame length corresponding to the provided milliseconds
vad_segments: a data.frame with columns channel, vad_segment, start, end and has_voice where the start/end values are in seconds
vad_stats: a list with elements channel, n_segments, n_segments_has_voice, n_segments_has_no_voice, seconds_has_voice, seconds_has_no_voice, pct_has_voice indicating the number of segments with voice and the duration of the voice/non-voice in the audio
Channel 0 means all audio combined in 1 channel.
library(audio) library(av) file <- system.file(package = "audio.vadwebrtc", "extdata", "stereo.mp3") vad <- VAD_channel(file, sample_rate = 32000, mode = "normal", milliseconds = 10, channels = "all") vad vad$vad_segments voiced <- is.voiced(vad, channel = 0, silence_min = 0.2, voiced_min = 1) voiced voiced <- is.voiced(vad, channel = 1, silence_min = 0.2, voiced_min = 1) voiced voiced <- is.voiced(vad, channel = 2, silence_min = 0.2, voiced_min = 1) voiced
library(audio) library(av) file <- system.file(package = "audio.vadwebrtc", "extdata", "stereo.mp3") vad <- VAD_channel(file, sample_rate = 32000, mode = "normal", milliseconds = 10, channels = "all") vad vad$vad_segments voiced <- is.voiced(vad, channel = 0, silence_min = 0.2, voiced_min = 1) voiced voiced <- is.voiced(vad, channel = 1, silence_min = 0.2, voiced_min = 1) voiced voiced <- is.voiced(vad, channel = 2, silence_min = 0.2, voiced_min = 1) voiced