Building CNN

Continuing from last time, we have the input array, X, and their respective labels in the array, y.

X: (num_sample, 13, 9, 1) -

y; (num_sample, 10) - 10 is the number of classes we have to classify

Preparing input shape

We don't care much for X.shape[0] since this is just the number of samples, hence why we take index 1 and 2 and add 1 to the end since we are training a CNN.

input_shape = (X.shape[1], X.shape[2], 1)

Flatten y

Flatten means to transform the ndarray into a 1D list (no columns)
np.argmax(ndarray, axis=0/1)
Go row by row in y and return the index where 1 is (since the rest are 0, 1 is the max from hot encoding)

# Dimensions: (21817, )
y_flat = np.argmax(y, axis=1)

Get Class Weight

Shifts gradient descent so that classes with less data (e.g. bass drum) will not be overlooked
Reduces bias
compute_class_weight("balanced", List of the categories' indexes, flattened array of the labels after hot encoding)
Used in model.fit()

from sklearn.utils.class_weight import compute_class_weight
class_weight = compute_class_weight(
    "balanced",
    np.unique(y_flat), 
    y_flat
)

Create CNN

Sequential Model
- Used to easily "add" consecutive layers in Keras
- In more complicated models, we would have to tell model which layers are connected to which
Add 4 Convolutional Layers
- Start with 16 layers and go up by powers of 2 (filters are tupically in powers of 2)
  - Progressively increase layers in hopes of learning more (and the more layers you pass the input through, the more specific details your model can learn)
- Strides are (1,1) since our input matrix size is only (13, 9, 1) which is quite small
  - Bigger input matrixes would be better with (2, 2) or even (5, 5)
- padding="same" conserves the dimensions of the input's shape in the previous layer
Max pooling
- Since out input matrix size is quite small, there is no need to pool down extensively
Dropout Layer
- May consider putting this more frequently between layers
- BARE MINIMUM: Build it before the flatten layer
- Dropout Layer
- May consider putting this more frequently between layers
- BARE MINIMUM: Build it before the flatten layer
Flatten Layer
- Flattens the data to 1 dimension
Dense Layers
- Add 3 dense layers that decrease until you reach the number of classes that are available to classify (10)
- Activation function is softmax for output layer

# Sequential Model
model = Sequential()

# Convolutional Layers
model.add(Conv2D(32, (3, 3), activation="relu", strides=(1, 1), padding="same", input_shape=input_shape))
model.add(Conv2D(32, (3, 3), activation="relu", strides=(1, 1), padding="same"))
model.add(Conv2D(64, (3, 3), activation="relu", strides=(1, 1), padding="same"))
model.add(Conv2D(128, (3, 3), activation="relu", strides=(1, 1), padding="same"))

# Max Pooling
model.add(MaxPool2D((2, 2)))

# Dropout Layer
model.add(Dropout(0.5))

# Flatten Layer
model.add(Flatten())

# Dense Layers
model.add(Dense(128, activation="relu"))
model.add(Dense(64, activation="relu"))
model.add(Dense(10, activation="softmax"))

# 
model.compile(loss="categorical_crossentropy", optimizer="Adam", metrics=["acc"])

# Model Summary
model.summary()

# Training CNN
model.fit(X, y, epochs=10, batch_size=32, shuffle=True, class_weight=class_weight)

Model evaluation
- categorical_crossentropy is good for multiclass classification
- Adam is go-to optimizer (allows you to not specify momentum)
Model summary
- Note large numbers and see if you can reduce them (especially param # as they could be taking up unnecessary processing power when your model is being fitted)

Training Results

PreviousBackground NextDeepplaylist RNNs To Predict Song Similarity

Last updated 3 years ago

Was this helpful?