Audio Classifier Tutorial

Prerequisite Knowledge

Sampling Frequency:# of samples per second eg. if the sampling frequency is 44100 hertz, a recording with a duration of 60 seconds will contain 2,646,000samples.

Dataset (UrbanSound8K)

10 Folders in total
9 Folders to train
1 Folder to test

Formatting the Data

Create wrapper class called UrbanSoundDataset that is a subclass of torch.utils.data.Dataset:

Fields (stores ____ of audio files when an object is instantiated)

File names
Labels
Folder number

Methods: __getitem__

Used to load and format data
Uses torchaudio.load()
- Converts .wav files to tensors
- Returns tuple (tensor, sampling frequency of audio file - 44100Hz)
Uses torchaudio.transforms.DownmixMono()
- Dataset has audio in 2 channels, this makes it into 1 channel
Network takes input size of 32,000 samples (only 0.7 seconds), but most audio files in dataset have 100,000 samples)
- Downsample to 8000Hz (32,000 samples now spans 4 seconds)
  - Achieved by taking every 5th sample of original audio tensor
    Not every audio tensor long enough to handle downsampling (these tensors will be padded with 0s)
  - Minimum length that won't require padding is 160,000 samples (32,000*5)

PreviousPyTorch NextConvolutional Neural Networks

Last updated 3 years ago

Was this helpful?