Model Preparation

We need to create an arbitary amount of time to sample the data with (We will be using 1/10 second, you're model should be able to predict within this timeframe). Try not to go above 1 second.

Calculate Number of Sample Chunks

  • Decided to chunk audio files into secions that are 1/10 seconds long

  • Want to calculate how many chunks in total Note: We multiply by 2 so we can get a larger sample size to ensure we have enough data.

sample_size = 2 * int(df.length.sum() / 0.10)

Calculate Probability Distribution of Categories

  • For randomly selecting categories when going through samples

categories = list(np.unique(df.label))
categories_dist = df.groupby(["label"])["length"].mean()

# Normalizing means
prob_dist = categories_dist / categories_dist.sum()

# What we want to use the probability distribution for
# random.choice needs percentages for probability distribution (which is why we normalized)
rand_category = np.random.choice(categories, p=prob_dist)

Building Model Config Class

  • To easily manipulate model (e.g. change the type of network we are building (This project supports convolutional and reccurent))

Formatting X and y Labels into numpy arrays

  • Review: (# of rows, # of columns) when getting the shape of a numpy array

  • These values are fed into the model (Remember X is the input and y is the output since we are doing supervised learning)

  • np.amin(ndarray)

    • Parameter: A matrix

    • Returns: Minimum value of matrix

One-Hot Coding

  • Process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction

  • Transforms categorical labels into vectors

    • Contains 0s and 1s

    • The length of vector = # of categories (so # of columns in array)

    • The height of vector = # of labeled data (so # of rows in array)

    • All elements in vector are 0 except for its category

    • eg. If we have these categories: [cat, dog, lizard], then cat's vector would be [1, 0, 0]

  • For categorical cross entropy

Last updated

Was this helpful?