What is the Gist of “Oversampling and undersampling in data analysis”?

If you have an imbalanced dataset, “you can change the dataset that you use to build your predictive model to have more balanced data.

“This change is called sampling your dataset and there are two main methods that you can use to even-up the classes:

  1. “You can add copies of instances from the under-represented class called over-sampling (or more formally sampling with replacement), or
  2. “You can delete instances from the over-represented class, called under-sampling

“…These approaches are often very easy to implement and fast to run. They are an excellent starting point.

“You can learn a little more in the the Wikipedia article titled “Oversampling and undersampling in data analysis.

“Some Rules of Thumb

Note that the author of the article lists several other methods to help with imbalanced datasets if you are interested in that.

.

Disclaimer:

I am not a professional in this field, nor do I claim to know all of the jargon that is typically used in this field. I am not summarizing my sources; I simply read from a variety of websites until I feel like I understand enough about a topic to move on to what I actually wanted to learn. If I am inaccurate in what I say or you know a better, simpler way to explain a concept, I would be happy to hear from you :).

Published by

George Evans

BS in Physics with a Minor in Mathematics.

Leave a Reply

Your email address will not be published. Required fields are marked *