Mingda Chen
Hello! I am a research scientist at FAIR NYC. I'm interested in natural language processing and more recently multimodal understanding and generation.
Email: chenmda gmail com
Curriculum Vitae (outdated)
Papers
For a complete list of papers, see my google scholar profile.
2025
😈ImpRAG: Retrieval-Augmented Generation with Implicit Queries
Wenzheng Zhang, Victoria Lin, Karl Stratos, Scott Wen-tau Yih, Mingda Chen
arXiv Preprint, 2025
arXiv
Improving Factuality with Explicit Working Memory
Mingda Chen, Yang Li, Karthik Padthe, Rulin Shao, Alicia Sun, Jacob Kahn, Luke Zettlemoyer, Gargi Ghosh, Scott Wen-tau Yih
Proceedings of ACL, 2025
arXiv
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference
Lee et al.
IEEE Micro, 2025
arXiv
2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
FAIR Chameleon Team
arXiv Preprint, 2024
arXiv
Few-Shot Data Synthesis for Open-Domain Multi-Hop Question Answering
Mingda Chen, Xilun Chen, Scott Wen-tau Yih
Proceedings of EACL, 2024 (oral)
arXiv / BibTex
2023
RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Victoria Lin*, Xilun Chen*, Mingda Chen*, Weijia Shi, Maria Lomeli, Rich James, Pedro Rodriguez, Jacob Kahn, Gergely Szilvasy, Mike Lewis, Luke Zettlemoyer, Scott Wen-tau Yih
Proceedings of ICLR, 2023
arXiv / BibTex
Findings of the IWSLT 2023 Evaluation Campaign
Agarwal et al.
Proceedings of IWSLT, 2023
PDF / BibTex
BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
Mingda Chen, Paul-Ambroise Duquenne, Pierre Andrews, Justine Kao, Alexandre Mourachko, Holger Schwenk, Marta R. Costa-jussà
Proceedings of ACL, 2023
arXiv / BibTex
xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages
Mingda Chen*, Kevin Heffernan*, Onur Çelebi, Alexandre Mourachko, Holger Schwenk
Proceedings of ACL, 2023 (oral)
arXiv / BibTex
2022
Leveraging Natural Supervision for Language Representation Learning and Generation
Mingda Chen
PhD Thesis, 2022
arXiv / BibTex
Improving In-Context Few-Shot Learning via Self-Supervised Training
Mingda Chen, Jingfei Du, Ramakanth Pasunuru, Todor Mihaylov, Srini Iyer, Veselin Stoyanov, Zornitsa Kozareva
Proceedings of NAACL, 2022
arXiv / Poster / Slides / BibTex
SummScreen: A Dataset for Abstractive Screenplay Summarization
Mingda Chen, Zewei Chu, Sam Wiseman, Kevin Gimpel
Proceedings of ACL, 2022 (oral)
arXiv / Poster / Slides / Data / BibTex
2021
TVRecap: A Dataset for Generating Stories with Character Descriptions
Mingda Chen, Kevin Gimpel
arXiv Preprint, 2021
arXiv / Data / BibTex
WikiTableT: A Large-Scale Data-to-Text Dataset for Generating Wikipedia Article Sections
Mingda Chen, Sam Wiseman, Kevin Gimpel
Findings of ACL, 2021
arXiv / Code / BibTex
2020
Exemplar-Controllable Paraphrasing and Translation using Bitext
Mingda Chen, Sam Wiseman, Kevin Gimpel
arXiv Preprint, 2020
arXiv / Data / BibTex
Mining Knowledge for Natural Language Inference from Wikipedia Categories
Mingda Chen*, Zewei Chu*, Karl Stratos, Kevin Gimpel
Findings of EMNLP, 2020
arXiv / Code / BibTex
Learning Probabilistic Sentence Representations from Paraphrases
Mingda Chen, Kevin Gimpel
Proceedings of RepL4NLP at ACL, 2020
arXiv / Slides / BibTex
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut
Proceedings of ICLR, 2020
arXiv / Code / BibTex
How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions
Zewei Chu, Mingda Chen*, Jing Chen*, Miaosen Wang*, Kevin Gimpel, Manaal Faruqui, Xiance Si
Proceedings of AAAI, 2020 (oral)
arXiv / Data / BibTex
2019
EntEval: A Holistic Evaluation Benchmark for Entity Representations
Mingda Chen*, Zewei Chu*, Yang Chen, Karl Stratos, Kevin Gimpel
Proceedings of EMNLP, 2019
arXiv / Poster / Code / BibTex
Evaluation Benchmarks and Learning Criteria for Discourse-Aware Sentence Representations
Mingda Chen*, Zewei Chu*, Kevin Gimpel
Proceedings of EMNLP, 2019 (oral)
arXiv / Slides / Code / BibTex
Controllable Paraphrase Generation with a Syntactic Exemplar
Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel
Proceedings of ACL, 2019
arXiv / Poster / Code / Train and Eval Data / BibTex
A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations
Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel
Proceedings of NAACL-HLT, 2019
arXiv / Poster / 1-Minute Slides / Code / Train Data / Eval Data / BibTex
2018
Variational Sequential Labelers for Semi-Supervised Learning
Mingda Chen, Qingming Tang, Karen Livescu, Kevin Gimpel
Proceedings of EMNLP, 2018 (oral)
PDF / Appendix / Slides / Code / BibTex
Smaller Text Classifiers with Discriminative Cluster Embeddings
Mingda Chen, Kevin Gimpel
Proceedings of NAACL-HLT, 2018
PDF / Poster / Code / BibTex