Sweet Puttu and Madurai Chokkanathar

Puttu Chokkanathar Temple is a Shiva temple in Madurai where Puttu is very famous. A beautiful story as why Chokkanathar is called Puttu Chokkanathar.

Once, the Pandya King ordered every one person in a house in Madurai should come for

putting sand on the banks of the river Vaigai to control flood. An old lady, who sold Puttu had no one she can send for the work. Shiva disguised as a young man approached the old lady with a deal. He will do the work for the lady and in return she has to give him Puttu every day
Lady agreed and she gave him Puttu. After having Puttu, instead of working the young man went to sleep. The King, who was supervising the work, beat him with a stick.

But everyone including the king felt the pain in their back

More from Anu Satheesh

More from All

How can we use language supervision to learn better visual representations for robotics?

Introducing Voltron: Language-Driven Representation Learning for Robotics!

Paper: https://t.co/gIsRPtSjKz
Models: https://t.co/NOB3cpATYG
Evaluation: https://t.co/aOzQu95J8z

🧵👇(1 / 12)


Videos of humans performing everyday tasks (Something-Something-v2, Ego4D) offer a rich and diverse resource for learning representations for robotic manipulation.

Yet, an underused part of these datasets are the rich, natural language annotations accompanying each video. (2/12)

The Voltron framework offers a simple way to use language supervision to shape representation learning, building off of prior work in representations for robotics like MVP (
https://t.co/Pb0mk9hb4i) and R3M (https://t.co/o2Fkc3fP0e).

The secret is *balance* (3/12)

Starting with a masked autoencoder over frames from these video clips, make a choice:

1) Condition on language and improve our ability to reconstruct the scene.

2) Generate language given the visual representation and improve our ability to describe what's happening. (4/12)

By trading off *conditioning* and *generation* we show that we can learn 1) better representations than prior methods, and 2) explicitly shape the balance of low and high-level features captured.

Why is the ability to shape this balance important? (5/12)

You May Also Like