Model Preparation
We need to create an arbitary amount of time to sample the data with (We will be using 1/10 second, you're model should be able to predict within this timeframe). Try not to go above 1 second.
Calculate Number of Sample Chunks
Decided to chunk audio files into secions that are 1/10 seconds long
Want to calculate how many chunks in total Note: We multiply by 2 so we can get a larger sample size to ensure we have enough data.
sample_size = 2 * int(df.length.sum() / 0.10)Calculate Probability Distribution of Categories
For randomly selecting categories when going through samples
categories = list(np.unique(df.label))
categories_dist = df.groupby(["label"])["length"].mean()
# Normalizing means
prob_dist = categories_dist / categories_dist.sum()
# What we want to use the probability distribution for
# random.choice needs percentages for probability distribution (which is why we normalized)
rand_category = np.random.choice(categories, p=prob_dist)Building Model Config Class
To easily manipulate model (e.g. change the type of network we are building (This project supports convolutional and reccurent))
Formatting X and y Labels into numpy arrays
Review:
(# of rows, # of columns)when getting the shape of a numpy arrayThese values are fed into the model (Remember X is the input and y is the output since we are doing supervised learning)
np.amin(ndarray)Parameter: A matrix
Returns: Minimum value of matrix
One-Hot Coding
Process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction
Transforms categorical labels into vectors
Contains 0s and 1s
The length of vector = # of categories (so # of columns in array)
The height of vector = # of labeled data (so # of rows in array)
All elements in vector are 0 except for its category
eg. If we have these categories:
[cat, dog, lizard], then cat's vector would be[1, 0, 0]
For categorical cross entropy
Last updated
Was this helpful?