Member-only story

Daily Data Science Tip #11

Why do we shuffle the dataset?

Feb 4, 2021

At the training phase of a neural network, if unshuffled data is fed forward, it would be observed that the neural network will learn features that are closely correlated to the class it was initially exposed to. This will increase the difficulty of an optimisation algorithm discovering an optimal solution for the entire dataset.

By shuffling the dataset, we ensure two key things:

1. There is large enough variance within the dataset that enables each data point within the training data to have an independent effect on the network.

2. Our validation partition of the dataset is obtained from the training data; if we fail to shuffle the dataset appropriately, our validation dataset will not represent the training data.

Daily Data Science Tip #11

Why do we shuffle the dataset?

Written by Richmond Alake

No responses yet