The bookmark has be...". Cheers! ✌ https://t.co/KCna1S3XKf
Hi! You can find it here. Original tweet by @AchuthArora: "@johnnyxbrown @buzz_chronicles @rattibha @SaveToBookmarks save as Workouts". Enjoy 👌
The bookmark has be...". Cheers! ✌ https://t.co/KCna1S3XKf
More from Buzz Chronicles
More from All
How can we use language supervision to learn better visual representations for robotics?
Introducing Voltron: Language-Driven Representation Learning for Robotics!
Paper: https://t.co/gIsRPtSjKz
Models: https://t.co/NOB3cpATYG
Evaluation: https://t.co/aOzQu95J8z
🧵👇(1 / 12)
Videos of humans performing everyday tasks (Something-Something-v2, Ego4D) offer a rich and diverse resource for learning representations for robotic manipulation.
Yet, an underused part of these datasets are the rich, natural language annotations accompanying each video. (2/12)
The Voltron framework offers a simple way to use language supervision to shape representation learning, building off of prior work in representations for robotics like MVP (https://t.co/Pb0mk9hb4i) and R3M (https://t.co/o2Fkc3fP0e).
The secret is *balance* (3/12)
Starting with a masked autoencoder over frames from these video clips, make a choice:
1) Condition on language and improve our ability to reconstruct the scene.
2) Generate language given the visual representation and improve our ability to describe what's happening. (4/12)
By trading off *conditioning* and *generation* we show that we can learn 1) better representations than prior methods, and 2) explicitly shape the balance of low and high-level features captured.
Why is the ability to shape this balance important? (5/12)
Introducing Voltron: Language-Driven Representation Learning for Robotics!
Paper: https://t.co/gIsRPtSjKz
Models: https://t.co/NOB3cpATYG
Evaluation: https://t.co/aOzQu95J8z
🧵👇(1 / 12)
Videos of humans performing everyday tasks (Something-Something-v2, Ego4D) offer a rich and diverse resource for learning representations for robotic manipulation.
Yet, an underused part of these datasets are the rich, natural language annotations accompanying each video. (2/12)
The Voltron framework offers a simple way to use language supervision to shape representation learning, building off of prior work in representations for robotics like MVP (https://t.co/Pb0mk9hb4i) and R3M (https://t.co/o2Fkc3fP0e).
The secret is *balance* (3/12)
Starting with a masked autoencoder over frames from these video clips, make a choice:
1) Condition on language and improve our ability to reconstruct the scene.
2) Generate language given the visual representation and improve our ability to describe what's happening. (4/12)
By trading off *conditioning* and *generation* we show that we can learn 1) better representations than prior methods, and 2) explicitly shape the balance of low and high-level features captured.
Why is the ability to shape this balance important? (5/12)
You May Also Like
Facebook originally a CIA program called "LifeLog".
LifeLog, via DARPA, terminated on Feb 4th, 2004.
Facebook was launched on Feb 4th, 2004.
Many of the LifeLog team became execs at FB.
Zuckerberg is a figurehead.
CIA allowed Cambridge to help Trump win
https://t.co/enzOXDCogV
Pentagon Kills LifeLog
LifeLog, via DARPA, terminated on Feb 4th, 2004.
Facebook was launched on Feb 4th, 2004.
Many of the LifeLog team became execs at FB.
Zuckerberg is a figurehead.
CIA allowed Cambridge to help Trump win
https://t.co/enzOXDCogV
Project: Lifelog
— Robert Horan (@Robby12692) December 13, 2018
Started by DARPA in 1999, the goal of Lifelog was to create a database on civilians without their knowledge, and track everything they do.
The project "ended" on Feb 4th, 2004.
Facebook began the exact same day.
The CIA funneled tens of millions into Facebook. pic.twitter.com/r7hwF0v9kh
Pentagon Kills LifeLog