We need to create an arbitary amount of time to sample the data with (We will be using 1/10 second, you're model should be able to predict within this timeframe). Try not to go above 1 second.
Calculate Number of Sample Chunks
Decided to chunk audio files into secions that are 1/10 seconds long
Want to calculate how many chunks in total Note: We multiply by 2 so we can get a larger sample size to ensure we have enough data.
sample_size = 2 * int(df.length.sum() / 0.10)
Calculate Probability Distribution of Categories
For randomly selecting categories when going through samples
categories = list(np.unique(df.label))
categories_dist = df.groupby(["label"])["length"].mean()
# Normalizing means
prob_dist = categories_dist / categories_dist.sum()
# What we want to use the probability distribution for
# random.choice needs percentages for probability distribution (which is why we normalized)
rand_category = np.random.choice(categories, p=prob_dist)
Building Model Config Class
To easily manipulate model (e.g. change the type of network we are building (This project supports convolutional and reccurent))
SAMPLE_SECTION_VALUE = <int that represents the length we are going to chunk the data>
class ModelConfig:
# The optional parameters are set to conventional values
def __init__(self, mode="conv", nfilt=26, nfeat=13, nfft=512, rate=16000):
self.mode = mode # You can configure mode to be conv, reccurent, etc.
self.nfilt = nfilt
self.nfeat = nfeat
self.nfft = nfft
self.rate = rate
self.step = int(rate * SAMPLE_SECTION_VALUE)
Formatting X and y Labels into numpy arrays
Review:(# of rows, # of columns) when getting the shape of a numpy array
These values are fed into the model (Remember X is the input and y is the output since we are doing supervised learning)
Parameter: A matrix
Returns: Minimum value of matrix
def build_features():
# Initial these lists to append each processed sample into them
X = []
y = []
# Set to highest and lowest floats possible
# We want initial values for min and max to start the tracking system in the for loop
_min, _max = float("inf"), -float("inf")
for i in tqdm(range(num_samples)):
# Get random category with the probabilities we calculated (so that we can also account for wavfile lengths)
rand_category = np.random.choice(categories, p=prob_dist)
# Get a random file that matches the class we just retrieved
file = np.random.choice(df[df.label == rand_category].index)
rate, wav = wavfile.read(os.path.join("clean", file))
# Check if the wav file is even long enough to cover one section
if wav.shape[0] < config.step:
# Subtract config.step to ensure we do not run out of data nearing the end of a sample
# In the next step, we are adding config.step, so we need at least config.step more room to increase
random_idx = np.random.randint(0, wav.shape[0] - config.step)
# Split out the sample you want to compute
# Random index we just retrieved plus the config.step
sample = wav[random_idx:random_idx + config.step]
# Calculate the mfcc
# To get those heat map looking images in visualize for classification
X_sample = mfcc(
).T # Transpose for proper shape in model (probably something you can see more clear later)
_min = min(np.amin(X_sample), _min) # if we have a new minimum, update
_max = max(np.amax(X_sample), _max) # if we have a new maximum, update
X.append(X_sample if config.mode == "conv" else X_sample.T)
# Append category to y, but must map back to its index
# y must be numbers to be converted into an array later (same goes for X)
# Dimensions are (length_of_X, 13, 9)
X_array, y_array = np.array(X), np.array(y)
# Normalizes array values (0, 1)
X = (X_array - _min) / (_max - _min)
if config.mode == "conv":
# Returns tuple:
# X_reshaped[0].shape=X_reshaped[1].shape=(13, 9, 1)
# len(X_reshaped) = length_of_X
X_reshaped = X.reshape(X.shape[0], X.shape[1], X.shape[2], 1)
elif config.mode == "recurrent":
X_reshaped = X.reshape(X.shape[0], X.shape[1], X.shape[2])
# One hot encoding linear variable (which is y)
# Dimensions are (length_of_X, 10)
y = to_categorical(y, num_classes=10)
return X_reshaped, y
One-Hot Coding
Process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction
Transforms categorical labels into vectors
Contains 0s and 1s
The length of vector = # of categories (so # of columns in array)
The height of vector = # of labeled data (so # of rows in array)
All elements in vector are 0 except for its category
eg. If we have these categories: [cat, dog, lizard], then cat's vector would be [1, 0, 0]