publications

publications by categories in reversed chronological order.

2023

  1. arXiv
    StarCoder: may the source be with you!
    Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, and 52 more authors
    arXiv:2305.06161, 2023
  2. ICLR DL4C
    🎅SantaCoder: Don’t reach for the stars!🌟
    Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, and 24 more authors
    Deep Learning for Code workshop, ICLR 2023

2022

  1. arXiv
    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
    Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, and 375 more authors
    arXiv:2211.05100, 2022
  2. NeurIPS Datasets
    The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
    Hugo LaurençonLucile SaulnierThomas Wang, Christopher Akiki, Albert Villanova Moral, Teven Le Scao, Leandro Von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, Jörg Frohberg, Mario Šaško, Quentin Lhoest, Angelina McMillan-Major, Gérard Dupont, and 39 more authors
    Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022
  3. EMNLP
    How sensitive are translation systems to extra contexts? Mitigating gender bias in Neural Machine Translation models through relevant contexts
    Shanya SharmaManan Dey, and Koustuv Sinha
    Findings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
  4. ACL Workshop
    You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings
    Zeerak Talat, Aurélie Névéol, Stella Biderman, Miruna Clinciu, Manan DeyShayne Longpre, Sasha Luccioni, Maraim Masoud, Margaret Mitchell, Dragomir Radev, Shanya Sharma, Arjun Subramonian, Jaesung Tae, Samson Tan, Deepak Tunuguntla, and 1 more author
    Challenges & Perspectives in Creating Large Language Models workshop at ACL, 2022
  5. ICLR
    Multitask Prompted Training Enables Zero-Shot Task Generalization
    Victor SanhAlbert WebsonColin RaffelStephen Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma, Eliza Szczechla, and 25 more authors
    International Conference on Learning Representations (ICLR) (Spotlight), 2022
  6. ACL Demo Track
    PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
    Stephen H. BachVictor Sanh, Zheng-Xin Yong, Albert WebsonColin Raffel, Nihal V. Nayak, Abheesht Sharma, Taewoon Kim, M Saiful Bari, Thibault Fevry, Zaid Alyafeai, Manan Dey, Andrea Santilli, Zhiqing Sun, Srulik Ben-David, and 12 more authors
    60th Annual Meeting of the Association for Computational Linguistics (ACL), Demo Track, 2022

2021

  1. arXiv
    Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
    Sabrina J. Mielke, Zaid Alyafeai, Elizabeth Salesky, Colin RaffelManan Dey, Matthias Gallé, Arun Raja, Chenglei Si, Wilson Y. Lee, Benoît Sagot, and Samson Tan
    arXiv:2112.10508, 2021

2020

  1. NeurIPS DCS
    Evaluating Gender Bias in Natural Language Inference
    Shanya SharmaManan Dey, and Koustuv Sinha
    Workshop on Dataset Curation and Security at NeurIPS, 2020

2019

  1. NeurIPS AI4SG
    Assessing Viewer’s Mental Health by Detecting Depression in YouTube Videos
    Shanya Sharma, and Manan Dey
    AI for Social Good workshop at NeurIPS, 2019