Buzz Chronicles
Follow
  • Home
  • Threads
    • Daily Charts
    • Most Popular
    • Most Recent
  • Authors
  • Categories
    • Life
    • Tech
    • Culture
    • Politics
    • Society
    • Fun
    • See All Categories
  • About

Authors Balaraman Ravindran

7 days 30 days All time Recent Popular
Balaraman Ravindran
Balaraman Ravindran
@ravi_iitm
AI4Bharat/@iitmadras and EkStep Foundation announced the release of Samanantar, the largest publicly available collection of parallel corpora for Indic languages. This work was supported by Tarento Technologies and the @rbc_dsai_iitm Download:

A few Highlights: 1. A total of 46.9M parallel sentences between English and 11 Indic languages
2. Of these, 34.6M parallel sentences were newly mined as a part of this work (~33M from IndicCorp that we released last year)

3. A total of 82.7M parallel sentences between 11C2 Indic language pairs extracted from the above English-centric parallel corpus
4. A new single script joint model for En-Indic and Indic-En translation which outperforms all commercial and publicly available systems.

5. A cross lingual semantic similarity benchmark containing ~30K annotations for ~9.5K sentence pairs from Samanantar certifying the quality of the mined data.

This is a tremendous effort that will have a great impact on Indic NLP research! @OfficialIndiaAI

Work spearheaded by my wonderful colleagues Mitesh Khapra and Pratyush Kumar!
ALL
  • Page 1 of 1
How does it work?
  • 💬 Reply to a thread with "@buzz_chronicles save" or "@buzz_chronicles save as category"
    🤖 Our bot will send you a link to your own folder on Buzz Chronicles. The thread will be saved in a form of an easy-to-read article
    📁 All your saved threads will be available at buzzchronicles.com/your_twitter_handle
Buzz Chronicles
  • Explore
  • Threads
  • Daily Charts
  • Authors
  • Categories
  • About
  • Terms of Service

Copyright © 2021 Buzz Chronicles - All right reserved