Manan Dey

prof_pic.jpg

I am currently working as a Software Engineer at SAP Labs, Bangalore. Previously, I had worked as a data-science intern at Impact Analytics, Bangalore and as a summer intern at the Indian Institute of Technology, Guwahati (IITG) under the guidance of Dr. Pradip K. Das.

My research interest lies in the intersection of Machine Learning, Deep Learning, and Natural Language Processing. A list of my publications can be found here.

I also love contributing to open-source projects and have been an active contributor to projects such as the BigScience workshop and the BigCode project. A list of my open-source contributions can be found here.

updates

May 4, 2023 Our paper “🎅SantaCoder: don’t reach for the stars!” won the Best Paper Award at the Deep Learning for Code workshop (DL4C), ICLR 2023
Nov 30, 2022 Our paper BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset, got accepted at the NeurIPS Datasets and Benchmarks Track, 2022 and the BLOOM: A 176B-Parameter Open-Access Multilingual Language Model pre-print was also released.
Oct 6, 2022 Our paper “How sensitive are translation systems to extra contexts? Mitigating gender bias in Neural Machine Translation models through relevant contexts” got accepted at the Findings of EMNLP, 2022
Aug 24, 2022 Our task “Indic Cause and Effect” - a task to measure a model’s ability to perform causal reasoning in 3 different Indic languages (Bengali, Hindi and Malayalam) got accepted at the Google BIG-bench 🪑
May 27, 2022 Our paper You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings, got accepted at the Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models.
Jan 29, 2022 Our paper Multitask Prompted Training Enables Zero-Shot Task Generalization, got accepted at ICLR 2022 as a Spotlight.
Jan 24, 2022 Our paper PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts, got accepted at the ACL, Demo Track, 2022
Dec 20, 2021 Our pre-print Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP, was released as part of the BigScience workshop 🌸 Tokenization working group.
🕰️ view older....

selected publications

  1. ICLR
    Multitask Prompted Training Enables Zero-Shot Task Generalization
    Victor SanhAlbert WebsonColin RaffelStephen Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma, Eliza Szczechla, and 25 more authors
    International Conference on Learning Representations (ICLR) (Spotlight), 2022