Machine Learning in the Real World 🧠 🤖
ML for real-world applications is much more than designing fancy networks and fine-tuning parameters.
In fact, you will spend most of your time curating a good dataset.
Let's go through the process together 👇
#RepostFriday
Collect Data 💽
We need to represent the real world as accurately as possible. If some situations are underrepresented we are introducing Sampling Bias.
Sampling Bias is nasty because we'll have high test accuracy, but our model will perform badly when deployed.
👇
Traffic Lights 🚦
Let's build a model to recognize traffic lights for a self-driving car. We need to collect data for different:
▪️ Lighting conditions
▪️ Weather conditions
▪️ Distances and viewpoints
▪️ Strange variants
And if we sample only 🚦 we won't detect 🚥 🤷♂️
👇
Data Cleaning 🧹
Now we need to clean all corrupted and irrelevant samples. We need to remove:
▪️ Overexposed or underexposed images
▪️ Images in irrelevant situations
▪️ Faulty images
Leaving them in the dataset will hurt our model's performance!
👇
Preprocess Data ⚙️
Most ML models like their data nicely normalized and properly scaled. Bad normalization can also lead to worse performance (I have a nice story for another time...)
▪️ Crop and resize all images
▪️ Normalize all values (usually 0 mean and 1 std. dev.)
👇