Authors Alisa Liu

7 days 30 days All time Recent Popular
We introduce a new paradigm for dataset creation based on human πŸ§‘β€πŸ’» and machine πŸ€– collaboration, which brings together the generative strength of LMs and the evaluative strength of humans. And we collect πŸŽ‰ WaNLI, a dataset of 108K NLI examples! 🧡

Paper: https://t.co/IUXcm9wIh2


Our pipeline starts with an existing dataset (MNLI), and uses data maps πŸ“œ to automatically identify pockets of examples that demonstrate challenging 🧐 reasoning patterns relative to a trained model. Then we use GPT-3 to generate new examples likely to have the same pattern. 2/


Next we propose a new metric, also inspired by data maps, to automatically filter generations for those most likely to aid model learning. Finally, we validate βœ… the generated examples through crowdworkers, who assign a gold label 🟑 and (optionally) revise for quality ✍️. 3/

Remarkably, replacing MNLI with WaNLI (which is 4x smaller) for training improves performanceπŸ“ˆ on seven OOD test setsπŸ§ͺ, including by 11% on HANS and 9% on ANLI. Under a data augmentation setting, combining MNLI with WaNLI is more effective than using other augmentation sets. 4/

Our method addresses limitations of crowdsourcing, where workers may resort to repetitive writing strategies 🀷, and leverages the great progress in text generation πŸ“ƒ. We get the best of both worlds: πŸ€–β€™s ability to produce diverse examples, and πŸ§‘β€πŸ’»β€™s ability to evaluate them. 5/