So you think you know distillation; it's easy, right?

We thought so too with @XiaohuaZhai @__kolesnikov__ @_arohan_ and the amazing @royaleerieme and Larisa Markeeva.

Until we didn't. But now we do again. Hop on for a ride (+the best ever ResNet50?)

🧵👇https://t.co/3SlkXVZcG3

This is not a fancy novel method. It's plain old distillation.

But we investigate it thoroughly, for model compression, via the lens of *function matching*.

We highligh two crucial principles that are often missed: Consistency and Patience. Only both jointly give good results!
0. Intuition: Want the student to replicate _the whole function_ represented by the teacher, everywhere that we expect data in input space.

This is a much stronger view than the commonly used "teacher generates better/more informative labels for the data". See pic above.
1. Consistency: to achieve this, teacher and student need to see the same view (crop) of the image. For example, this means no pre-computed teacher logits! We can generate many more views via mixup.

Other approaches may look good early, but eventually fall behind consistency.
2. Patience: The function matching task is HARD! We need to train *a lot* longer than typical, and actually we were not able to reach saturation yet. Overfitting does not happen, as when function-matching, an "overfit" student is great! (Note: w/ pre-computed teacher, we overfit)
2b. Excessively long training may mean optim struggle. We try advanced optimization via Shampoo, and get 4x faster convergence.

We believe this setting is a great test-bed for optimizer research: No concern of overfitting, and reducing training error means generalizing better!
3. By distilling a couple large BiT R152x2 models into a ResNet-50, we get a ResNet-50 on ImageNet that gets 82.8% at 224px resolution, and 80.5% at 160px! 😎

No "tricks" just plain distillation, patiently matching functions.
4. Importantly, this simple strategy works on many datasets of various sizes, down to only 1020 training images, where anything else we tried overfit horribly.

Be patient, be consistent, that's it. Eventually, you'll reach or outperform your teacher!
2c. We can't stress patience enough. Multiple strategies, for example initializing the student with a pre-trained model shown here, look promising at first, but eventually plateau and are outperformed by patient, consistent function matching.
5. We have a lot more content. MobileNet students, distilling on on "random other" data (shown below), very thorough baselines, a teacher ensemble, and.... BiT download statistics!
PS: we are working on releasing a bunch of the models, including the best ones, ... but we're also on vacation. Watch https://t.co/Age8NXgS1D and stay tuned, we're aiming for next week!

More from All

https://t.co/6cRR2B3jBE
Viruses and other pathogens are often studied as stand-alone entities, despite that, in nature, they mostly live in multispecies associations called biofilms—both externally and within the host.

https://t.co/FBfXhUrH5d


Microorganisms in biofilms are enclosed by an extracellular matrix that confers protection and improves survival. Previous studies have shown that viruses can secondarily colonize preexisting biofilms, and viral biofilms have also been described.


...we raise the perspective that CoVs can persistently infect bats due to their association with biofilm structures. This phenomenon potentially provides an optimal environment for nonpathogenic & well-adapted viruses to interact with the host, as well as for viral recombination.


Biofilms can also enhance virion viability in extracellular environments, such as on fomites and in aquatic sediments, allowing viral persistence and dissemination.
॥ॐ॥
अस्य श्री गायत्री ध्यान श्लोक:
(gAyatri dhyAna shlOka)
• This shloka to meditate personified form of वेदमाता गायत्री was given by Bhagwaan Brahma to Sage yAgnavalkya (याज्ञवल्क्य).

• 14th shloka of गायत्री कवचम् which is taken from वशिष्ठ संहिता, goes as follows..


• मुक्ता-विद्रुम-हेम-नील धवलच्छायैर्मुखस्त्रीक्षणै:।
muktA vidruma hEma nIla dhavalachhAyaiH mukhaistrlkShaNaiH.

• युक्तामिन्दुकला-निबद्धमुकुटां तत्वार्थवर्णात्मिकाम्॥
yuktAmindukalA nibaddha makutAm tatvArtha varNAtmikam.

• गायत्रीं वरदाभयाङ्कुश कशां शुभ्रं कपालं गदाम्।
gAyatrIm vardAbhayANkusha kashAm shubhram kapAlam gadAm.

• शंखं चक्रमथारविन्दयुगलं हस्तैर्वहन्ती भजै॥
shankham chakramathArvinda yugalam hastairvahantIm bhajE.

This shloka describes the form of वेदमाता गायत्री.

• It says, "She has five faces which shine with the colours of a Pearl 'मुक्ता', Coral 'विद्रुम', Gold 'हेम्', Sapphire 'नील्', & a Diamond 'धवलम्'.

• These five faces are symbolic of the five primordial elements called पञ्चमहाभूत:' which makes up the entire existence.

• These are the elements of SPACE, FIRE, WIND, EARTH & WATER.

• All these five faces shine with three eyes 'त्रिक्षणै:'.

You May Also Like

And here they are...

THE WINNERS OF THE 24 HOUR STARTUP CHALLENGE

Remember, this money is just fun. If you launched a product (or even attempted a launch) - you did something worth MUCH more than $1,000.

#24hrstartup

The winners 👇

#10

Lattes For Change - Skip a latte and save a life.

https://t.co/M75RAirZzs

@frantzfries built a platform where you can see how skipping your morning latte could do for the world.

A great product for a great cause.

Congrats Chris on winning $250!


#9

Instaland - Create amazing landing pages for your followers.

https://t.co/5KkveJTAsy

A team project! @bpmct and @BaileyPumfleet built a tool for social media influencers to create simple "swipe up" landing pages for followers.

Really impressive for 24 hours. Congrats!


#8

SayHenlo - Chat without distractions

https://t.co/og0B7gmkW6

Built by @DaltonEdwards, it's a platform for combatting conversation overload. This product was also coded exclusively from an iPad 😲

Dalton is a beast. I'm so excited he placed in the top 10.


#7

CoderStory - Learn to code from developers across the globe!

https://t.co/86Ay6nF4AY

Built by @jesswallaceuk, the project is focused on highlighting the experience of developers and people learning to code.

I wish this existed when I learned to code! Congrats on $250!!