1/2
Ram— 3 times
Ayodhya— 4 times
in a 140 characters long tweet by Indian President. This must have blinded and deafened taaliberals who read it. 🤪
Let this chant deafen them tonight—
||जय श्री राम|| 🚩🚩🚩
Let me explain how name #Ram is about POWER. 🙏 @LostTemple7 https://t.co/7KmktsZ7bj
\u0930\u093e\u092e \u0915\u0947 \u092c\u093f\u0928\u093e \u0905\u092f\u094b\u0927\u094d\u092f\u093e, \u0905\u092f\u094b\u0927\u094d\u092f\u093e \u0939\u0948 \u0939\u0940 \u0928\u0939\u0940\u0902\u0964 \u0905\u092f\u094b\u0927\u094d\u092f\u093e \u0924\u094b \u0935\u0939\u0940 \u0939\u0948, \u091c\u0939\u093e\u0902 \u0930\u093e\u092e \u0939\u0948\u0902\u0964 \u0907\u0938 \u0928\u0917\u0930\u0940 \u092e\u0947\u0902 \u092a\u094d\u0930\u092d\u0941 \u0930\u093e\u092e \u0938\u0926\u093e \u0915\u0947 \u0932\u093f\u090f \u0935\u093f\u0930\u093e\u091c\u092e\u093e\u0928 \u0939\u0948\u0902\u0964 \u0907\u0938\u0932\u093f\u090f \u092f\u0939 \u0938\u094d\u0925\u093e\u0928 \u0938\u0939\u0940 \u0905\u0930\u094d\u0925\u094b\u0902 \u092e\u0947\u0902 \u0905\u092f\u094b\u0927\u094d\u092f\u093e \u0939\u0948\u0964
— President of India (@rashtrapatibhvn) August 29, 2021
More from All
Introducing Voltron: Language-Driven Representation Learning for Robotics!
Paper: https://t.co/gIsRPtSjKz
Models: https://t.co/NOB3cpATYG
Evaluation: https://t.co/aOzQu95J8z
🧵👇(1 / 12)
Videos of humans performing everyday tasks (Something-Something-v2, Ego4D) offer a rich and diverse resource for learning representations for robotic manipulation.
Yet, an underused part of these datasets are the rich, natural language annotations accompanying each video. (2/12)
The Voltron framework offers a simple way to use language supervision to shape representation learning, building off of prior work in representations for robotics like MVP (https://t.co/Pb0mk9hb4i) and R3M (https://t.co/o2Fkc3fP0e).
The secret is *balance* (3/12)
Starting with a masked autoencoder over frames from these video clips, make a choice:
1) Condition on language and improve our ability to reconstruct the scene.
2) Generate language given the visual representation and improve our ability to describe what's happening. (4/12)
By trading off *conditioning* and *generation* we show that we can learn 1) better representations than prior methods, and 2) explicitly shape the balance of low and high-level features captured.
Why is the ability to shape this balance important? (5/12)