BC ALL

Saved by @ankitsrihbti

Vladimir Haltakov
@haltakov 5 years, 1 month ago 763 views

Save to PDF Share See On Twitter

This is Karma. Karma is not a machine learning classifier 🐕‍🦺

Karma is a real dog trained to detect drugs. However, he would fail the simplest tests we apply in ML...

Let me take you through this story from the eyes of an ML engineer.

https://t.co/WAXRUlTvSI

Thread 🧵

Story TLDR 🔖

The story is about police dogs trained to sniff drugs. The problem is that the dogs often signal drugs even if there are none. Then innocent people land in jail for days.

The cops even joke about the “probable cause on four legs”.

Let's see why is that 👇

1. Sampling Bias 🤏

Drugs were found in 64% of the cars Karma identified, which was praised by the police as very good. In the end, most people don't carry drugs in their cars, so 64% seems solid.

There was a sampling problem though... 👇

The cars were not sampled at random! The police only did the sniff test if there was a serious suspicion that something is wrong.

The chance there are drugs in the car is much higher in this case!

2. Evaluation Metrics 🔍

The police referred to a 2014 study from Poland measuring the efficacy of sniffer dogs. The problem was that every test actually contained drugs!

This means there was no chance to measure false positives from the dogs! Only recall, not precision 🤦‍♂️

3. Leaking Training Data 🚰

Another study found that the dogs learned to recognize the emotions of their handlers during tests. They felt that their human wanted them to find drugs in the specific test scenario, so they did.

The trainer leaked the ground truth during testing.

4. Overfitting ➿

Similar to the one above, in many cases, the dog saw that their handler wanted to find drugs in a car during a traffic stop. So it would raise an alarm.

The dog was rewarded before the car was actually searched! It found an easy signal giving it a reward.

Summary 🏁

It is fascinating how many problems there are with the sniffer dogs that are well known to machine learning engineers (and of course mathematicians). Some of them are even common sense...

Avoid these problems not only when training your model, but also in life 😃

If you liked this thread and want to read more about self-driving cars and machine learning follow me @haltakov!

More from Vladimir Haltakov

Vladimir Haltakov
@haltakov

Machine Learning Paper Reviews 🔎📜

Check out this thread for short reviews of some interesting Machine Learning and Computer Vision papers. I explain the basic ideas and main takeaways of each paper in a Twitter thread.

👇 I'm adding new reviews all the time! 👇

AlexNet - the paper that started the deep learning revolution in Computer Vision!

It's finally time for some paper review! \U0001f4dc\U0001f50d\U0001f9d0

I promised the other day to start posting threads with summaries of papers that had a big impact on the field of ML and CV.

Here is the first one - the AlexNet paper! pic.twitter.com/QNLPIMZSIa
— Vladimir Haltakov (@haltakov) September 28, 2020

DenseNet - reducing the size and complexity of CNNs by adding dense connections between layers.

ML paper review time - DenseNet! \U0001f578\ufe0f

This paper won the Best Paper Award at the 2017 Conference on Computer Vision and Pattern Recognition (CVPR) - the best conference for computer vision problems.

It introduces a new CNN architecture where the layers are densely connected. pic.twitter.com/DuHytaoXia
— Vladimir Haltakov (@haltakov) October 15, 2020

Playing for data - generating synthetic GT from a video game (GTA V) and using it to improving semantic segmentation models.

Time for another ML paper review - generating synthetic ground truth data from video games! \U0001f3ae

I love this paper, because it pushes the boundaries of creating realistic synthetic ground truth data and shows that you can use it for training and improve your model.

Details \U0001f447 pic.twitter.com/fBgORYG8Lz
— Vladimir Haltakov (@haltakov) October 5, 2020

Transformers for image recognition - a new paper with the potential to replace convolutions with a transformer.

Another paper review, but a little different this time... \U0001f937\u200d\u2642\ufe0f

The paper is not published yet, but is submitted for review at ICLR 2021. It is getting a lot of attention from the CV/ML community, though, and many speculate that it is the end of CNNs... \U0001f447https://t.co/bh6wUxYfxu pic.twitter.com/dZGBYB8A5U
— Vladimir Haltakov (@haltakov) October 5, 2020

Vladimir Haltakov
@haltakov

AI Job Interviews - another good example of bias in ML 🤦‍♂️

Two journalists tested some AI tools for assessing job candidates. Even when the candidate read a Wiki article in German instead of answering questions in English, the AI systems gave them good scores 🤷‍♂️

Let's unpack 👇

The Setup 🔬

The journalists created a fake job posting on two AI interview platforms. They specified the traits of the ideal candidate and provided the questions that need to be answered during the interview.

Then they started experimenting... 👇

The Positive Test ✅

One of them did a fake interview giving all the right answers and predictably got very high scores - 8.5 out of 9 👍

Then she tried something different... 👇

The Negative Test ❌

In a second interview, instead of answering the questions in English, she just read the article on psychometrics from the German Wikipedia 😁

One system gave her a score of 6 out of 9, while the other determined she is a 73% match for the job.

Oops... 👇

What happened? 🔍

Interestingly, one of the systems generated a transcript which was obviously meaningless.

This means that the machine learning model behind the tool likely captured nuances of the intonation of the speaker instead of the meaning of the actual words.

👇

Advertisement

Vladimir Haltakov
@haltakov

Let's talk about a common problem in ML - imbalanced data ⚖️

Imagine we want to detect all pixels belonging to a traffic light from a self-driving car's camera. We train a model with 99.88% performance. Pretty cool, right?

Actually, this model is useless ❌

Let me explain 👇

The problem is the data is severely imbalanced - the ratio between traffic light pixels and background pixels is 800:1.

If we don't take any measures, our model will learn to classify each pixel as background giving us 99.88% accuracy. But it's useless!

What can we do? 👇

Let me tell you about 3 ways of dealing with imbalanced data:

▪️ Choose the right evaluation metric
▪️ Undersampling your dataset
▪️ Oversampling your dataset
▪️ Adapting the loss

Let's dive in 👇

1️⃣ Evaluation metrics

Looking at the overall accuracy is a very bad idea when dealing with imbalanced data. There are other measures that are much better suited:
▪️ Precision
▪️ Recall
▪️ F1 score

I wrote a whole thread on

How to evaluate your ML model? \U0001f4cf

Your accuracy is 97%, so this is pretty good, right? Right? No! \u274c

Just looking at the model accuracy is not enough. Let me tell you about some other metrics:
\u25aa\ufe0f Recall
\u25aa\ufe0f Precision
\u25aa\ufe0f F1 score
\u25aa\ufe0f Confusion matrix

Let's start \U0001f447
— Vladimir Haltakov (@haltakov) August 31, 2021

2️⃣ Undersampling

The idea is to throw away samples of the overrepresented classes.

One way to do this is to randomly throw away samples. However, ideally, we want to make sure we are only throwing away samples that look similar.

Here is a strategy to achieve that 👇

Vladimir Haltakov
@haltakov

Dataset bias strikes again! 😬

This is a great example of dataset bias! This ML method classifies skin lesions as being malignant or not.

Can you see the problem from the examples? 🤔

Let me shortly explain 👇

The dataset used to train the model contains images of both malignant and benign lesions. The problem is that the malignant images often contain rulers printed on the skin. They are used by doctors to assess the lesion over time.

Guess what the model learned... 👇

The ML model learned to detect the rulers... 🤷‍♂️

"In our dataset, images with rulers were more likely to be malignant; thus the algorithm inadvertently “learned” that rulers are

$haltakov.eth \U0001f30d \U0001f1fa\U0001f1e6$

haltakov.eth 🌍 🇺🇦...
@haltakov

Machine Learning in the Real World 🧠 🤖

ML for real-world applications is much more than designing fancy networks and fine-tuning parameters.

In fact, you will spend most of your time curating a good dataset.

Let's go through the process together 👇

#RepostFriday

Collect Data 💽

We need to represent the real world as accurately as possible. If some situations are underrepresented we are introducing Sampling Bias.

Sampling Bias is nasty because we'll have high test accuracy, but our model will perform badly when deployed.

👇

Traffic Lights 🚦

Let's build a model to recognize traffic lights for a self-driving car. We need to collect data for different:

▪️ Lighting conditions
▪️ Weather conditions
▪️ Distances and viewpoints
▪️ Strange variants

And if we sample only 🚦 we won't detect 🚥 🤷‍♂️

👇

Data Cleaning 🧹

Now we need to clean all corrupted and irrelevant samples. We need to remove:

▪️ Overexposed or underexposed images
▪️ Images in irrelevant situations
▪️ Faulty images

Leaving them in the dataset will hurt our model's performance!

👇

Preprocess Data ⚙️

Most ML models like their data nicely normalized and properly scaled. Bad normalization can also lead to worse performance (I have a nice story for another time...)

▪️ Crop and resize all images
▪️ Normalize all values (usually 0 mean and 1 std. dev.)

👇

More from All

The Chartians
@chartians

Took me 5 years to get the best Chartink scanners for Stock Market, but you’ll get it in 5 mminutes here ⏰

Do Share the above tweet 👆

These are going to be very simple yet effective pure price action based scanners, no fancy indicators nothing - hope you liked it.

https://t.co/JU0MJIbpRV

52 Week High
One of the classic scanners very you will get strong stocks to Bet on.

https://t.co/V69th0jwBr

Hourly Breakout
This scanner will give you short term bet breakouts like hourly or 2Hr breakout

Volume shocker
Volume spurt in a stock with massive X times

$\u0936\u094d\u0930\u0940\u092e\u093e\u0932\u0940 \u0939\u093f\u0924\u0947\u0936 \u0905\u0935\u0938\u094d\u0925\u0940$

श्रीमाली हितेश अवस्थी...
@HiteshAwasthi89

#Thread
12 Most Powerful Hanuman Mantras

Hanuman ji is the Hindu God of strength, valor, agility, and wisdom. He is considered as the incarnation (avatar) of bhagwan Shiva. Hanuman ji is the symbol of devotion and dedication. He is the greatest devotee of Prabhu Shri Ram. 1/n

Hanuman ji is known by several names such as Bajrangabali, Mahabali, Maruti Nandan, Pawanputra, Mahaveer, and Anjaneya. Hanuman ji is the son of Bhagwan Vayu, the wind god.

God Hanuman played a prominent part in rescuing Mata Sitafrom the bondage of demon King of Lanka, Ravana.

One can worship Hanuman ji by chanting Hanuman Mantras. There are many different types of Hanuman Mantras, each fulfilling a specific purpose. Hanuman Mantras help to beget the blessings of Mahabali Hanuman for success. 3/n

Let’s explore these powerful Hanuman Mantras one by one. Here’s the list for you:

1. Hanuman Moola Mantra
!! Om Hanumate Namah !!
ऊँ हनुमते नम:

You can chant the “Hanuman Moola Mantra if you usually face problems and obstacles in your life.4/n
#hanumanjanmotsav

However, the Human Moola Mantra is also a very powerful success mantra.

It’s a Kaarya Siddhi Mantra.

It is highly advised that all those who usually face obstacles and problems in their life should chant the Hanuman Moola Mantra. 5/n
#hanumanjanmotsav

Advertisement

libertyandjusticeforal...
@Mark923to25

1. #Qanon told us that Saudi Arabia is a central piece of the puzzle in taking out the Deep State...we are watching the Deep State attempt to remove Crown Prince MBS and disrupt Pres Trump's administration's ties with MBS and the "new" Saudi/US coalition. Early in his admin,

2....@Potus sent Jared Kushner (JK) to SA on an undisclosed trip and told King Salman that business as usual was over, and if he wanted Pres Trump's support, Prince Alwaleed would have to be divested of $ and power; Alwaleed is one of top Deep State $ men and was tied to HRC.

3. @Potus knew that before he could begin cleansing the US and the rest of the world of the DS, he would have to cut one of the financial head's off the snake (Alwaleed). After JK met w/ King Salman, Alwaleed was arrested and divested of billions; MBS became leader of SA.

4. SA loves Pres Trump because he's a man of his word; SA could not trust Hussein or Clintons because they use SA for oil and leverage; SA has many factions, but essentially you have the new and upcoming SA that is attempting to modernize through Crown Prince MBS, and then you

5...have the old guard represented by Alwaleed who was one of the Deep State financial heads and loyal to the DS more than SA; SA knew Hussein was financing Iran to hold western world hostage, and SA was caught in middle of Deep State black mail; this is why @Potus was celebrated

Angela Jamison
@AnJamison

@bellingcat @buzz\_chronicles @buzz\\\_chronicles @buzz_chronicles save as category @pdfmakerapp grab @pdfmakerapp grab this @readwiseio save @readwiseio save thread @threader compile summarize @Threadreaderapp unroll please @ttttreads unroll @unrollthread.com @SaveToBookmarks

Angela Jamison
@AnJamison

@JeffJacksonNC @buzz\_chronicles @buzz\\\_chronicles @buzz_chronicles save as category @pdfmakerapp grab @pdfmakerapp grab this @readwiseio save @readwiseio save thread @threader compile summarize @Threadreaderapp unroll please @ttttreads unroll @unrollthread.com @SaveToBookmarks

You May Also Like

Will Buxton
@wbuxtonofficial

The Haas / Force India argument is fascinating, and the decision of the stewards may still have consequences. The first key finding is that the stewards ruled Racing Point Force India to be a valid constructor. That means that it builds its own cars.

Haas had argued that the cars had been designed by Force India, but as Haas knows only too well, "outsourcing" is legal. The stewards decided that "there is no regulatory support for the argument that Outsourcing of Listed Parts cannot come from a former or excluded team."

In determining this, the stewards also came to the following conclusion: "In relation to the submission by the Racing Point Force India F1 Team that it is not a new team, the Stewards decide that the Racing Point Force India F1 Team is indeed a new team."

This will have a big knock on effect for Haas in its continued arguments over the rights of RPFI as relates to its eligibility and rights under the complex payments structure. Haas has argued all along that RPFI should be treated as a new team and tread the same path that it did.

TLDR? Haas lost the fight but may have won the war. In having its argument that Racing Point hadn't designed their own car thrown out, they gained clarification that RPFI was a new team.

Anu Satheesh
@AnuSatheesh5

Dhanwanthari Temple, Nelluvai
Thrissur District, Kerala.🚩🚩

ॐ नमो भगवते वासुदेवाय धन्वन्तरये अमृतकलश हस्ताय
सर्वामय विनाशनाय त्रैलोक्यनाथाय श्री महाविष्णवे नमः
#temples

Dhanwanthary Moorthy temple in Nelluvai is one among the ancient temples in Kerala.

Dhanwanthari is the Deva of Ayurveda. It is said that the murthy at the temple was installed by Ashwani Devas and it was the same idol worshipped by Sri. Vasudevar in the Dwapara Yuga. The Dhanvanthari Moorthi is facing towards the west
@SriramKannan77

Smrata-Matrarti-Nasanah', Srimad Bhagavatam states that the one who remembers the name of Dhanwantari will be cured and protected from all the diseases. Temple is known for its Oushada Prasada with Ayuvedic medicinal values, called Mukkudi.

Mukkudi Nivedyam is a very special offering to Dhanwantari. It is prepared by using dry ginger, pepper, pomegranate peel, cumin seed, turmeric, rock salt and other medicinal items. It is offered during 'Usha- Pooja', and is later on distributed to the devotees present there.

Advertisement

Chris Herd
@chris_herd

I've spoken to 1,500+ people about remote work in the last 9 months

A few predictions of what is likely to emerge before 2030

[ a thread ] 💻🏠🌍

🚜 Rural Living: World-class people will move to smaller cities, have a lower cost of living & higher quality of life

These regions must innovate quickly to attract that wealth. Better schools, faster internet connections are a must

⏰Asynchronous Work: Offices are instantaneous gratification distraction factories where synchronous work makes it impossible to get stuff done

Tools that enable asynchronous work are the most important thing globally remote teams need. A lot of startups will try to tackle this

⚽️Hobbie Renaissance: Remote working will lead to a rise in people participating in hobbies and activities which link them to people in their local community

This will lead to deeper, more meaningful relationships which overcome societal issues of loneliness and issolation

🌍Diversity & Inclusion: The most diverse and inclusive teams in history will emerge rapidly

Companies who embrace it have a first-mover advantage to attract great talent globally. Companies who don't will lose their best people to their biggest competitors

The '60s at 60
@the_60s_at_60

"The MAD Primer of Bigots, Extremists and Other Loose Ends," from September 1969.

Seen a couple of these panels make the rounds from time to time, so here's the complete set of 10 (something to amuse and/or offend almost everyone).

Chapter 1: The Super Patriot

Chapter 2: The Ku Klux Klansman

Chapter 3: The American Student

Chapter 4: The Right-Wing Extremist

Mo Rajabifard
@morajabi

👨‍💻 Last resume I sent to a startup one year ago, sharing with you to get ideas:

- Forget what you don't have, make your strength bold
- Pick one work experience and explain what you did in detail w/ bullet points
- Write it towards the role you apply
- Give social proof

/thread

"But I got no work experience..."

Make a open source lib, make a small side project for yourself, do freelance work, ask friends to work with them, no friends? Find friends on Github, and Twitter.

Bonus points:

- Show you care about the company: I used the company's brand font and gradient for in the resume for my name and "Thank You" note.
- Don't list 15 things and libraries you worked with, pick the most related ones to the role you're applying.
-🙅‍♂️"copy cover letter"

"I got no firends, no work"

One practical way is to reach out to conferences and offer to make their website for free. But make sure to do it good. You'll get:

- a project for portfolio
- new friends
- work experience
- learnt new stuff
- new thing for Twitter bio

If you don't even have the skills yet, why not try your chance for @LambdaSchool? No? @freeCodeCamp. Still not? Pick something from here and learn https://t.co/7NPS1zbLTi
You'll feel very overwhelmed, no escape, just acknowledge it and keep pushing.