miscellaneous | Manan Dey

updates

May 4, 2023	Our paper “🎅SantaCoder: don’t reach for the stars!” won the Best Paper Award at the Deep Learning for Code workshop (DL4C), ICLR 2023
Nov 30, 2022	Our paper BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset, got accepted at the NeurIPS Datasets and Benchmarks Track, 2022 and the BLOOM: A 176B-Parameter Open-Access Multilingual Language Model pre-print was also released.
Oct 6, 2022	Our paper “How sensitive are translation systems to extra contexts? Mitigating gender bias in Neural Machine Translation models through relevant contexts” got accepted at the Findings of EMNLP, 2022
Aug 24, 2022	Our task “Indic Cause and Effect” - a task to measure a model’s ability to perform causal reasoning in 3 different Indic languages (Bengali, Hindi and Malayalam) got accepted at the Google BIG-bench 🪑
May 27, 2022	Our paper You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings, got accepted at the Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models.
Jan 29, 2022	Our paper Multitask Prompted Training Enables Zero-Shot Task Generalization, got accepted at ICLR 2022 as a Spotlight.
Jan 24, 2022	Our paper PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts, got accepted at the ACL, Demo Track, 2022
Dec 20, 2021	Our pre-print Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP, was released as part of the BigScience workshop 🌸 Tokenization working group.
Jul 30, 2021	Our team (Sentence-Transformers) was one of the special nominees at the Flax Community Event organized by Hugging Face 🤗 and Google.
Mar 22, 2021	Participated in the HuggingFace 🤗 XLSR Fine-Tuning Sprint and submitted models fine-tuned on Common Voice dataset achieving the best WER score on the test set of multiple languages (Tamil, Irish and Punjabi)
Dec 11, 2020	Our paper “Evaluating Gender Bias in Natural Language Inference” got accepted at the Workshop on Dataset Curation and Security - NeurIPS 2020
Nov 30, 2020	Participated in the HuggingFace 🤗 Dataset sprint and was one of the core-contributors. Added multiple datasets to the Datasets Library (XQuAD-R, msr_genomics_kbcomp, hippocorpus).
Dec 14, 2019	Received XPRIZE AI for Good Travel Grant as well as Travel Grant from NeurIPS for presenting our paper at the AI for Social Good workshop at the conference in 2019.
Dec 14, 2019	Our paper “Assessing Viewer’s Mental Health by Detecting Depression inYouTube Videos” got accepted at the AI for Social Good Workshop at NeurIPS 2019.
Oct 5, 2019	Our paper and abstract got selected at the 3rd International Workshop on Mining Actionable Insights from Social Networks, 2019 and the Montreal AI Symposium 2019 for poster presentation respectively.

activities

volunteer	TMLR, ICLR (2020, 2021, 2022), ICML (2020, 2021), NeurIPS (2020), EMNLP {Sponsor Booth set-up for ByteDance} (2020), ACL (2021), NAACL (2021)
reviewer	Deep Learning for Code workshop, ICLR 2023, Workshop on Dataset Curation and Security, NeurIPS 2020, ML4H: Machine Learning for Health, 2021, 2022 (Member of Program Committee), SyntheticData4ML Workshop, NeurIPS 2022, JupyterCon 2020, Live Music for the NeurIPS 2019 Banquet, hello:world Hackathon, 2020, 2021 [calhacks.io] (Judge), Montreal AI Symposium, 2022