BC ALL

Saved by @ankitsrihbti

Vladimir Haltakov
@haltakov 4 years ago 629 views

Save to PDF Share See On Twitter

This is Karma. Karma is not a machine learning classifier 🐕‍🦺

Karma is a real dog trained to detect drugs. However, he would fail the simplest tests we apply in ML...

Let me take you through this story from the eyes of an ML engineer.

https://t.co/WAXRUlTvSI

Thread 🧵

Story TLDR 🔖

The story is about police dogs trained to sniff drugs. The problem is that the dogs often signal drugs even if there are none. Then innocent people land in jail for days.

The cops even joke about the “probable cause on four legs”.

Let's see why is that 👇

1. Sampling Bias 🤏

Drugs were found in 64% of the cars Karma identified, which was praised by the police as very good. In the end, most people don't carry drugs in their cars, so 64% seems solid.

There was a sampling problem though... 👇

The cars were not sampled at random! The police only did the sniff test if there was a serious suspicion that something is wrong.

The chance there are drugs in the car is much higher in this case!

2. Evaluation Metrics 🔍

The police referred to a 2014 study from Poland measuring the efficacy of sniffer dogs. The problem was that every test actually contained drugs!

This means there was no chance to measure false positives from the dogs! Only recall, not precision 🤦‍♂️

3. Leaking Training Data 🚰

Another study found that the dogs learned to recognize the emotions of their handlers during tests. They felt that their human wanted them to find drugs in the specific test scenario, so they did.

The trainer leaked the ground truth during testing.

4. Overfitting ➿

Similar to the one above, in many cases, the dog saw that their handler wanted to find drugs in a car during a traffic stop. So it would raise an alarm.

The dog was rewarded before the car was actually searched! It found an easy signal giving it a reward.

Summary 🏁

It is fascinating how many problems there are with the sniffer dogs that are well known to machine learning engineers (and of course mathematicians). Some of them are even common sense...

Avoid these problems not only when training your model, but also in life 😃

If you liked this thread and want to read more about self-driving cars and machine learning follow me @haltakov!

More from Vladimir Haltakov

Vladimir Haltakov
@haltakov

Let's talk about a common problem in ML - imbalanced data ⚖️

Imagine we want to detect all pixels belonging to a traffic light from a self-driving car's camera. We train a model with 99.88% performance. Pretty cool, right?

Actually, this model is useless ❌

Let me explain 👇

The problem is the data is severely imbalanced - the ratio between traffic light pixels and background pixels is 800:1.

If we don't take any measures, our model will learn to classify each pixel as background giving us 99.88% accuracy. But it's useless!

What can we do? 👇

Let me tell you about 3 ways of dealing with imbalanced data:

▪️ Choose the right evaluation metric
▪️ Undersampling your dataset
▪️ Oversampling your dataset
▪️ Adapting the loss

Let's dive in 👇

1️⃣ Evaluation metrics

Looking at the overall accuracy is a very bad idea when dealing with imbalanced data. There are other measures that are much better suited:
▪️ Precision
▪️ Recall
▪️ F1 score

I wrote a whole thread on

How to evaluate your ML model? \U0001f4cf

Your accuracy is 97%, so this is pretty good, right? Right? No! \u274c

Just looking at the model accuracy is not enough. Let me tell you about some other metrics:
\u25aa\ufe0f Recall
\u25aa\ufe0f Precision
\u25aa\ufe0f F1 score
\u25aa\ufe0f Confusion matrix

Let's start \U0001f447
— Vladimir Haltakov (@haltakov) August 31, 2021

2️⃣ Undersampling

The idea is to throw away samples of the overrepresented classes.

One way to do this is to randomly throw away samples. However, ideally, we want to make sure we are only throwing away samples that look similar.

Here is a strategy to achieve that 👇

Vladimir Haltakov
@haltakov

Dataset bias strikes again! 😬

This is a great example of dataset bias! This ML method classifies skin lesions as being malignant or not.

Can you see the problem from the examples? 🤔

Let me shortly explain 👇

The dataset used to train the model contains images of both malignant and benign lesions. The problem is that the malignant images often contain rulers printed on the skin. They are used by doctors to assess the lesion over time.

Guess what the model learned... 👇

The ML model learned to detect the rulers... 🤷‍♂️

"In our dataset, images with rulers were more likely to be malignant; thus the algorithm inadvertently “learned” that rulers are

Advertisement

Vladimir Haltakov
@haltakov

Machine Learning Paper Reviews 🔎📜

Check out this thread for short reviews of some interesting Machine Learning and Computer Vision papers. I explain the basic ideas and main takeaways of each paper in a Twitter thread.

👇 I'm adding new reviews all the time! 👇

AlexNet - the paper that started the deep learning revolution in Computer Vision!

It's finally time for some paper review! \U0001f4dc\U0001f50d\U0001f9d0

I promised the other day to start posting threads with summaries of papers that had a big impact on the field of ML and CV.

Here is the first one - the AlexNet paper! pic.twitter.com/QNLPIMZSIa
— Vladimir Haltakov (@haltakov) September 28, 2020

DenseNet - reducing the size and complexity of CNNs by adding dense connections between layers.

ML paper review time - DenseNet! \U0001f578\ufe0f

This paper won the Best Paper Award at the 2017 Conference on Computer Vision and Pattern Recognition (CVPR) - the best conference for computer vision problems.

It introduces a new CNN architecture where the layers are densely connected. pic.twitter.com/DuHytaoXia
— Vladimir Haltakov (@haltakov) October 15, 2020

Playing for data - generating synthetic GT from a video game (GTA V) and using it to improving semantic segmentation models.

Time for another ML paper review - generating synthetic ground truth data from video games! \U0001f3ae

I love this paper, because it pushes the boundaries of creating realistic synthetic ground truth data and shows that you can use it for training and improve your model.

Details \U0001f447 pic.twitter.com/fBgORYG8Lz
— Vladimir Haltakov (@haltakov) October 5, 2020

Transformers for image recognition - a new paper with the potential to replace convolutions with a transformer.

Another paper review, but a little different this time... \U0001f937\u200d\u2642\ufe0f

The paper is not published yet, but is submitted for review at ICLR 2021. It is getting a lot of attention from the CV/ML community, though, and many speculate that it is the end of CNNs... \U0001f447https://t.co/bh6wUxYfxu pic.twitter.com/dZGBYB8A5U
— Vladimir Haltakov (@haltakov) October 5, 2020

$haltakov.eth \U0001f30d \U0001f1fa\U0001f1e6$

haltakov.eth 🌍 🇺🇦...
@haltakov

Machine Learning in the Real World 🧠 🤖

ML for real-world applications is much more than designing fancy networks and fine-tuning parameters.

In fact, you will spend most of your time curating a good dataset.

Let's go through the process together 👇

#RepostFriday

Collect Data 💽

We need to represent the real world as accurately as possible. If some situations are underrepresented we are introducing Sampling Bias.

Sampling Bias is nasty because we'll have high test accuracy, but our model will perform badly when deployed.

👇

Traffic Lights 🚦

Let's build a model to recognize traffic lights for a self-driving car. We need to collect data for different:

▪️ Lighting conditions
▪️ Weather conditions
▪️ Distances and viewpoints
▪️ Strange variants

And if we sample only 🚦 we won't detect 🚥 🤷‍♂️

👇

Data Cleaning 🧹

Now we need to clean all corrupted and irrelevant samples. We need to remove:

▪️ Overexposed or underexposed images
▪️ Images in irrelevant situations
▪️ Faulty images

Leaving them in the dataset will hurt our model's performance!

👇

Preprocess Data ⚙️

Most ML models like their data nicely normalized and properly scaled. Bad normalization can also lead to worse performance (I have a nice story for another time...)

▪️ Crop and resize all images
▪️ Normalize all values (usually 0 mean and 1 std. dev.)

👇

Vladimir Haltakov
@haltakov

AI Job Interviews - another good example of bias in ML 🤦‍♂️

Two journalists tested some AI tools for assessing job candidates. Even when the candidate read a Wiki article in German instead of answering questions in English, the AI systems gave them good scores 🤷‍♂️

Let's unpack 👇

The Setup 🔬

The journalists created a fake job posting on two AI interview platforms. They specified the traits of the ideal candidate and provided the questions that need to be answered during the interview.

Then they started experimenting... 👇

The Positive Test ✅

One of them did a fake interview giving all the right answers and predictably got very high scores - 8.5 out of 9 👍

Then she tried something different... 👇

The Negative Test ❌

In a second interview, instead of answering the questions in English, she just read the article on psychometrics from the German Wikipedia 😁

One system gave her a score of 6 out of 9, while the other determined she is a 73% match for the job.

Oops... 👇

What happened? 🔍

Interestingly, one of the systems generated a transcript which was obviously meaningless.

This means that the machine learning model behind the tool likely captured nuances of the intonation of the speaker instead of the meaning of the actual words.

👇

More from All

$\u0b85\u0ba9\u0bcd\u0baa\u0bc6\u0bb4\u0bbf\u0bb2\u0bcd$

அன்பெழில்...
@anbezhil12

#ஆதித்தியஹ்ருதயம் ஸ்தோத்திரம்
இது சூரிய குலத்தில் உதித்த இராமபிரானுக்கு தமிழ் முனிவர் அகத்தியர் உபதேசித்ததாக வால்மீகி இராமாயணத்தில் வருகிறது. ஆதித்ய ஹ்ருதயத்தைத் தினமும் ஓதினால் பெரும் பயன் பெறலாம் என மகான்களும் ஞானிகளும் காலம் காலமாகக் கூறி வருகின்றனர். ராம-ராவண யுத்தத்தை

தேவர்களுடன் சேர்ந்து பார்க்க வந்திருந்த அகத்தியர், அப்போது போரினால் களைத்து, கவலையுடன் காணப்பட்ட ராமபிரானை அணுகி, மனிதர்களிலேயே சிறந்தவனான ராமா போரில் எந்த மந்திரத்தைப் பாராயணம் செய்தால் எல்லா பகைவர்களையும் வெல்ல முடியுமோ அந்த ரகசிய மந்திரத்தை, வேதத்தில் சொல்லப்பட்டுள்ளதை உனக்கு

நான் உபதேசிக்கிறேன், கேள் என்று கூறி உபதேசித்தார். முதல் இரு சுலோகங்கள் சூழ்நிலையை விவரிக்கின்றன. மூன்றாவது சுலோகம் அகத்தியர் இராமபிரானை விளித்துக் கூறுவதாக அமைந்திருக்கிறது. நான்காவது சுலோகம் முதல் முப்பதாம் சுலோகம் வரை ஆதித்ய ஹ்ருதயம் என்னும் நூல். முப்பத்தி ஒன்றாம் சுலோகம்

இந்தத் துதியால் மகிழ்ந்த சூரியன் இராமனை வாழ்த்துவதைக் கூறுவதாக அமைந்திருக்கிறது.
ஐந்தாவது ஸ்லோகம்:
ஸர்வ மங்கள் மாங்கல்யம் ஸர்வ பாப ப்ரநாசனம்
சிந்தா சோக ப்ரசமனம் ஆயுர் வர்த்தனம் உத்தமம்
பொருள்: இந்த அதித்ய ஹ்ருதயம் என்ற துதி மங்களங்களில் சிறந்தது, பாவங்களையும் கவலைகளையும்

குழப்பங்களையும் நீக்குவது, வாழ்நாளை நீட்டிப்பது, மிகவும் சிறந்தது. இதயத்தில் வசிக்கும் பகவானுடைய அனுக்ரகத்தை அளிப்பதாகும்.
முழு ஸ்லோக லிங்க் பொருளுடன் இங்கே உள்ளது https://t.co/Q3qm1TfPmk
சூரியன் உலக இயக்கத்திற்கு மிக முக்கியமானவர். சூரிய சக்தியால்தான் ஜீவராசிகள், பயிர்கள்

Lakshminarayan G
@narayanagl

The Chanting of following names of Bhagwan Vishnu, immensely helps us in overcoming the obstacles in our daily life.
These names are from Sri Vishnu Sahasranama!
|. "Om Vashatkaaraaya Namaha" : For Success in Business.
||. "Om Aksharaaya Namaha" : For Success in Studies.

|||. "Om Bhuthabhavanaya Namaha" : For Good Health.
|V. "Om Paramaathmane Namaha" : For Self Confidence.

Src: VAK magazine from Chilkur Balaji Temple, Hyderabad - Sri @csranga
#SanatanaDharma #SanatanaSanskriti #Sattology

One needs to chant the following slokas 28 times to get rid of certain problems in life.
Om Hrushikesaya Namah - For Overcoming Bad habits
Om Vashatkaaraya Namah - For Success in Business, Interviews, visa interviews, building relationships

Om Srimate Namah - For Handsome appearance & wealth
Om Aksharaya Namah - For Education & better financial strength
Om Paramatmane Namah - For self employed people, for promotions and success in games.
Om Putatmane Namah - To remove mental stress & for mental peace

Om Sarmane Namah - For Job Satisfaction
Om Bhutadaye Namah -To amend soured friendship or any personal relationship
Om Dhatre Namah - For issueless couple
Om vidhatre Namah - Pregnant Ladies to Chant for healthy babies.
Chant the following 108 times :

Advertisement

WORLD OF SANATAN DHARM...
@world_sanatan

WHY BABUR NEVER TRIED TO ATTACK SRI KRISHNADEVARAYA ?

A🧵u must read

It is very well known that Babur considered Krishna Dev Raya as the strongest ruler of the entire subcontinent, and Vijayanagar empire the strongest during that time in India.

1/10

The Vijayanagar empire reached its peak during 1509-1529 around the reign of Krishna Dev Raya.

Vijayanagar’s famed elephant brigade 👇

2/10

At his command there were over 50000 elite troops with a regiment of Portuguese gunners and 3200 cavalry with 600 elephants.

3/10

Along with that more than four hundred thousand ( four lakh) peasant levies and irregular military made this empire one of the largest in all of South Asia.

A typical Vijayanagar levie soldier 👇

4/10

During Baburs invasion, Krishna Dev Raya ruled supreme in all of deccan.

In fact, in an all our brawl, Krisha Dev Raya will beat Babur fair and square.

Babur could muster at max 50000 troops with 50 canons.

While Krishna Dev Raya had these numbers in Hampi alone

5/10

Buzz Chronicles
@buzz_chronicles

@SaveToBookmarks Cześć! You can find it here. Original tweet by @AnJamison: "@45RapeKatieJohn @buzz\_chronicles @buzz\\\_chronicles @buzz_chronicles save as ...". Thanks! 👌

Hello! You can find it here. Original tweet by @SaveToBookmarks: "@AnJamison @45RapeKatieJohn @buzz @buzz_chronicles @pdfmakerapp @Readwiseio @thr...". Enjoy 👌

Buzz Chronicles
@buzz_chronicles

@SaveToBookmarks Hi! Sure. Original tweet by @buzz_chronicles: "@SaveToBookmarks Namaste! You can find it here. Original tweet by @AchuthArora: ...". Enjoy 👌

Hi! Of course! Original tweet by @SaveToBookmarks: "@buzz_chronicles @AchuthArora @AJA_Cortes @rattibha You won't miss this tweet an...". Enjoy 👌

You May Also Like

Aditya Todmal
@AdityaTodmal

A THREAD ON @SarangSood

Decoded his way of analysis/logics for everyone to easily understand.

Have covered:
1. Analysis of volatility, how to foresee/signs.
2. Workbook
3. When to sell options
4. Diff category of days
5. How movement of option prices tell us what will happen

1. Keeps following volatility super closely.

Makes 7-8 different strategies to give him a sense of what's going on.

Whichever gives highest profit he trades in.

I am quite different from your style. I follow the market's volatility very closely. I have mock positions in 7-8 different strategies which allows me to stay connected. Whichever gives best profit is usually the one i trade in.
— Sarang Sood (@SarangSood) August 13, 2019

2. Theta falls when market moves.
Falls where market is headed towards not on our original position.

Anilji most of the time these days Theta only falls when market moves. So the Theta actually falls where market has moved to, not where our position was in the first place. By shifting we can come close to capturing the Theta fall but not always.
— Sarang Sood (@SarangSood) June 24, 2019

3. If you're an options seller then sell only when volatility is dropping, there is a high probability of you making the right trade and getting profit as a result

He believes in a market operator, if market mover sells volatility Sarang Sir joins him.

This week has been great so far. The main aim is to be in the right side of the volatility, rest the market will reward.
— Sarang Sood (@SarangSood) July 3, 2019

4. Theta decay vs Fall in vega

Sell when Vega is falling rather than for theta decay. You won't be trapped and higher probability of making profit.

There is a difference between theta decay & fall in vega. Decay is certain but there is no guaranteed profit as delta moves can increase cost. Fall in vega on the other hand is backed by a powerful force that sells options and gives handsome returns. Our job is to identify them.
— Sarang Sood (@SarangSood) February 12, 2020

Formula 1
@F1

After 17 years and more than 300 races, it could be @alo_oficial's last F1 race on Sunday 😭 He's given us a lot over the years... including some superb GIFs!

Here we have a rare example of a GIF you can actually hear. TOMA. 💪

The Spanish Samurai

Never give up. 💪 #GraciasFernando

Winning is everything. #GraciasFernando

Advertisement

Dr. Jane Clare Jones
@janeclarejones

So, on the subject of bonkers hyperbolic pretzeling over the Bell judgement, Grace 'destroy books I don't like & make inappropriate jokes about sterilising teenage girls' Lavery has some thoughts.

Tell me why my feminism is wrong Grace.

Oh

Well, if anyone thought the Bell judgment was going to make TRAs reconsider making massively overblown claims with no evidence backed up with nothing but a thick wadge of emotional blackmail.... HAHAHA, no one thought that.

A high court in the UK made a delimited judgment about teenager's ability to consent to puberty blockers. This puts all trans people everywhere in the world at risk.

Because if any human anywhere has any thoughts that deviate in any way from the rote line dictated by

the trans rights movement, this puts all trans people everywhere in mortal danger.

Let's be honest Grace. It doesn't put trans people at risk. It puts trans ideology at risk. Because trans ideology depends on the idea of innate gender identity, and the trans child is the

necessary material evidence of the ontology of gender identity.

That is, children are being medicalised to provide evidence to underwrite adults identities.

Nothing to see here.

Jaya_Upadhyaya
@Jayalko1

SRI BANKEY BIHARI JI, VRINDAVAN (UP)

Very famous Krishna Mandir where Bankey Bihari Ji is worshipped and looked after as a child.
Bankey means “Bent at three places”.
Thakur Ji was worshipped in Nidhivan till 1863. This mandir was constructed in 1864 by Goswamis.
@LostTemple7

Legend says that Swami Haridas used to worship Krishna in Nidhivan.
Once, his curious disciples entered the forest but they were almost blinded by bright, intense light. Upon knowing this, Swamiji himself went there and after his requests, Krishna appeared with his consort.

Krishna, then, left back this black charming murti,which we see now, before disappearing.
A curtain is drawn before the murti after every 15-30 second. It is said that if one stares long enough into the eyes of Shri Banke Bihari, the person would lose his self consciousness.

Bankey Bihari Ji sports a flute only once a year on the eve of Sharad Purnima.
Darshan of His lotus feet can be done only on Akshay Tritiya every year.
There are no loud bells here and
no early morning Darshan can be done so that child Krishna’s sleep is not disturbed.

$\u0b85\u0ba9\u0bcd\u0baa\u0bc6\u0bb4\u0bbf\u0bb2\u0bcd$

அன்பெழில்...
@anbezhil12

#ஆதித்தியஹ்ருதயம் ஸ்தோத்திரம்
இது சூரிய குலத்தில் உதித்த இராமபிரானுக்கு தமிழ் முனிவர் அகத்தியர் உபதேசித்ததாக வால்மீகி இராமாயணத்தில் வருகிறது. ஆதித்ய ஹ்ருதயத்தைத் தினமும் ஓதினால் பெரும் பயன் பெறலாம் என மகான்களும் ஞானிகளும் காலம் காலமாகக் கூறி வருகின்றனர். ராம-ராவண யுத்தத்தை

தேவர்களுடன் சேர்ந்து பார்க்க வந்திருந்த அகத்தியர், அப்போது போரினால் களைத்து, கவலையுடன் காணப்பட்ட ராமபிரானை அணுகி, மனிதர்களிலேயே சிறந்தவனான ராமா போரில் எந்த மந்திரத்தைப் பாராயணம் செய்தால் எல்லா பகைவர்களையும் வெல்ல முடியுமோ அந்த ரகசிய மந்திரத்தை, வேதத்தில் சொல்லப்பட்டுள்ளதை உனக்கு

நான் உபதேசிக்கிறேன், கேள் என்று கூறி உபதேசித்தார். முதல் இரு சுலோகங்கள் சூழ்நிலையை விவரிக்கின்றன. மூன்றாவது சுலோகம் அகத்தியர் இராமபிரானை விளித்துக் கூறுவதாக அமைந்திருக்கிறது. நான்காவது சுலோகம் முதல் முப்பதாம் சுலோகம் வரை ஆதித்ய ஹ்ருதயம் என்னும் நூல். முப்பத்தி ஒன்றாம் சுலோகம்

இந்தத் துதியால் மகிழ்ந்த சூரியன் இராமனை வாழ்த்துவதைக் கூறுவதாக அமைந்திருக்கிறது.
ஐந்தாவது ஸ்லோகம்:
ஸர்வ மங்கள் மாங்கல்யம் ஸர்வ பாப ப்ரநாசனம்
சிந்தா சோக ப்ரசமனம் ஆயுர் வர்த்தனம் உத்தமம்
பொருள்: இந்த அதித்ய ஹ்ருதயம் என்ற துதி மங்களங்களில் சிறந்தது, பாவங்களையும் கவலைகளையும்

குழப்பங்களையும் நீக்குவது, வாழ்நாளை நீட்டிப்பது, மிகவும் சிறந்தது. இதயத்தில் வசிக்கும் பகவானுடைய அனுக்ரகத்தை அளிப்பதாகும்.
முழு ஸ்லோக லிங்க் பொருளுடன் இங்கே உள்ளது https://t.co/Q3qm1TfPmk
சூரியன் உலக இயக்கத்திற்கு மிக முக்கியமானவர். சூரிய சக்தியால்தான் ஜீவராசிகள், பயிர்கள்