Whats the difference between a test set and a train set?

You may have to understand this concept by knowing three different concepts and they are

a. training set

b. validation set

c. test set

Any data-set you have and when you want to apply any algorithms to it you need to split the data-set into the above three.

a. training set usually you give around 60% of your original data-set.This contains a set of data that has pre-classified target and predictor variables.That is to fit the parameters.

b. validation set usually around 20% is required to validate the learning so far from the model. In statistics it is known as cross validation.Results here are compared to the unused pre-classified data.The validation data-set provides an unbiased evaluation of a model fit on the training data-set.

c. test set usually around 20% here we apply our chosen prediction algorithm on our test set in order to see how it’s going to perform so we can have an idea about our algorithm’s performance.It is not good to use the same data for training as well as testing, since it would not let us know how well the network generalizes and whether or not over-fitting has happened. Hence we need to keep separate pairs.

Splits can also be 60-20-20 or even 70-15-15

Leave a Comment