You are building a binary classification model by using a supplied training set.
The training set is imbalanced between two classes.
You need to resolve the data imbalance.
What are three possible ways to achieve this goal? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.
A . Penalize the classification
B . Resample the dataset using undersampling or oversampling
C . Normalize the training feature set
D . Generate synthetic samples in the minority class
E . Use accuracy as the evaluation metric of the model
Answer: ABD
Explanation:
A: Try Penalized Models
You can use the same algorithms but give them a different perspective on the problem.
Penalized classification imposes an additional cost on the model for making classification mistakes on the minority class during training. These penalties can bias the model to pay more attention to the minority class.
B: You can change the dataset that you use to build your predictive model to have more balanced data.
This change is called sampling your dataset and there are two main methods that you can use to even-up the classes:
– Consider testing under-sampling when you have an a lot data (tens- or hundreds of thousands of instances or more)
– Consider testing over-sampling when you don’t have a lot of data (tens of thousands of records or less)
D: Try Generate Synthetic Samples
A simple way to generate synthetic samples is to randomly sample the attributes from instances in the minority class.
Reference: https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-
dataset/