Mingda Chen
Hello! I am a research scientist at FAIR NYC. I'm interested in natural language processing and more recently multimodal understanding and generation.
Email: chenmda gmail com 
Curriculum Vitae (outdated)
Papers
For a complete list of papers, see my google scholar profile.
2025
- 😈ImpRAG: Retrieval-Augmented Generation with Implicit Queries 
 Wenzheng Zhang, Victoria Lin, Karl Stratos, Scott Wen-tau Yih, Mingda Chen
 arXiv Preprint, 2025
 arXiv
- Improving Factuality with Explicit Working Memory 
 Mingda Chen, Yang Li, Karthik Padthe, Rulin Shao, Alicia Sun, Jacob Kahn, Luke Zettlemoyer, Gargi Ghosh, Scott Wen-tau Yih
 Proceedings of ACL, 2025
 arXiv
- Characterizing and Efficiently Accelerating Multimodal Generation Model Inference 
 Lee et al.
 IEEE Micro, 2025
 arXiv
2024
- Chameleon: Mixed-Modal Early-Fusion Foundation Models 
 FAIR Chameleon Team
 arXiv Preprint, 2024
 arXiv
- Few-Shot Data Synthesis for Open-Domain Multi-Hop Question Answering 
 Mingda Chen, Xilun Chen, Scott Wen-tau Yih
 Proceedings of EACL, 2024 (oral)
 arXiv / BibTex
2023
- RA-DIT: Retrieval-Augmented Dual Instruction Tuning 
 Victoria Lin*, Xilun Chen*, Mingda Chen*, Weijia Shi, Maria Lomeli, Rich James, Pedro Rodriguez, Jacob Kahn, Gergely Szilvasy, Mike Lewis, Luke Zettlemoyer, Scott Wen-tau Yih
 Proceedings of ICLR, 2023
 arXiv / BibTex
- Findings of the IWSLT 2023 Evaluation Campaign 
 Agarwal et al.
 Proceedings of IWSLT, 2023
 PDF / BibTex
- BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric 
 Mingda Chen, Paul-Ambroise Duquenne, Pierre Andrews, Justine Kao, Alexandre Mourachko, Holger Schwenk, Marta R. Costa-jussà
 Proceedings of ACL, 2023
 arXiv / BibTex
- xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages 
 Mingda Chen*, Kevin Heffernan*, Onur Çelebi, Alexandre Mourachko, Holger Schwenk
 Proceedings of ACL, 2023 (oral)
 arXiv / BibTex
2022
- Leveraging Natural Supervision for Language Representation Learning and Generation 
 Mingda Chen
 PhD Thesis, 2022
 arXiv / BibTex
- Improving In-Context Few-Shot Learning via Self-Supervised Training 
 Mingda Chen, Jingfei Du, Ramakanth Pasunuru, Todor Mihaylov, Srini Iyer, Veselin Stoyanov, Zornitsa Kozareva
 Proceedings of NAACL, 2022
 arXiv / Poster / Slides / BibTex
- SummScreen: A Dataset for Abstractive Screenplay Summarization 
 Mingda Chen, Zewei Chu, Sam Wiseman, Kevin Gimpel
 Proceedings of ACL, 2022 (oral)
 arXiv / Poster / Slides / Data / BibTex
2021
- TVRecap: A Dataset for Generating Stories with Character Descriptions 
 Mingda Chen, Kevin Gimpel
 arXiv Preprint, 2021
 arXiv / Data / BibTex
- WikiTableT: A Large-Scale Data-to-Text Dataset for Generating Wikipedia Article Sections 
 Mingda Chen, Sam Wiseman, Kevin Gimpel
 Findings of ACL, 2021
 arXiv / Code / BibTex
2020
- Exemplar-Controllable Paraphrasing and Translation using Bitext 
 Mingda Chen, Sam Wiseman, Kevin Gimpel
 arXiv Preprint, 2020
 arXiv / Data / BibTex
- Mining Knowledge for Natural Language Inference from Wikipedia Categories 
 Mingda Chen*, Zewei Chu*, Karl Stratos, Kevin Gimpel
 Findings of EMNLP, 2020
 arXiv / Code / BibTex
- Learning Probabilistic Sentence Representations from Paraphrases 
 Mingda Chen, Kevin Gimpel
 Proceedings of RepL4NLP at ACL, 2020
 arXiv / Slides / BibTex
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations 
 Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut
 Proceedings of ICLR, 2020
 arXiv / Code / BibTex
- How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions 
 Zewei Chu, Mingda Chen*, Jing Chen*, Miaosen Wang*, Kevin Gimpel, Manaal Faruqui, Xiance Si
 Proceedings of AAAI, 2020 (oral)
 arXiv / Data / BibTex
2019
- EntEval: A Holistic Evaluation Benchmark for Entity Representations 
 Mingda Chen*, Zewei Chu*, Yang Chen, Karl Stratos, Kevin Gimpel
 Proceedings of EMNLP, 2019
 arXiv / Poster / Code / BibTex
- Evaluation Benchmarks and Learning Criteria for Discourse-Aware Sentence Representations 
 Mingda Chen*, Zewei Chu*, Kevin Gimpel
 Proceedings of EMNLP, 2019 (oral)
 arXiv / Slides / Code / BibTex
- Controllable Paraphrase Generation with a Syntactic Exemplar 
 Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel
 Proceedings of ACL, 2019
 arXiv / Poster / Code / Train and Eval Data / BibTex
- A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations 
 Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel
 Proceedings of NAACL-HLT, 2019
 arXiv / Poster / 1-Minute Slides / Code / Train Data / Eval Data / BibTex
2018
- Variational Sequential Labelers for Semi-Supervised Learning 
 Mingda Chen, Qingming Tang, Karen Livescu, Kevin Gimpel
 Proceedings of EMNLP, 2018 (oral)
 PDF / Appendix / Slides / Code / BibTex
- Smaller Text Classifiers with Discriminative Cluster Embeddings 
 Mingda Chen, Kevin Gimpel
 Proceedings of NAACL-HLT, 2018
 PDF / Poster / Code / BibTex