Daniel Campos

Publications

2026

Shivani Upadhyay, Daniel Campos, Nandan Thakur, Ronak Pradeep, Nick Craswell, Jimmy Lin - Automating Generation of Long-Form Queries - SIGIR 2026

2025

Sahaj Upadhyay, Nandan Thakur, Ronak Pradeep, Nick Craswell, Daniel Campos, Jimmy Lin - Overview of the TREC 2025 Retrieval Augmented Generation (RAG) Track - TREC 2025

Ronak Pradeep, Nandan Thakur, Sahaj Upadhyay, Daniel Campos, Nick Craswell, Ian Soboroff, Jimmy Lin - The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models - SIGIR 2025

Sahaj Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Jimmy Lin - A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA - SIGIR 2025

Nandan Thakur, Ronak Pradeep, Sahaj Upadhyay, Daniel Campos, Nick Craswell, Jimmy Lin - Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges

Nandan Thakur, Ronak Pradeep, Sahaj Upadhyay, Daniel Campos, Nick Craswell, Ian Soboroff, Jimmy Lin - Assessing Support for the TREC 2024 RAG Track - SIGIR 2025

Jaeseong Lee, seung-won hwang, Aurick Qiao, Daniel F Campos, Zhewei Yao, Yuxiong He - STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning - ACL 2025

Yunjae Lee, seung-won hwang, Daniel F Campos, Filip Gralinski, Zhewei Yao, Yuxiong He - Inference Scaling for Bridging Retrieval and Augmented Generation - NAACL 2025

Yunjae Lee, seung-won hwang, Daniel F Campos, Filip Gralinski, Zhewei Yao, Yuxiong He - CORD: Balancing COnsistency and Rank Distillation for Robust Retrieval-Augmented Generation - NAACL 2025

Jaeseong Lee, seung-won hwang, Aurick Qiao, Daniel Campos, Zhewei Yao, Yuxiong He - TALE: Token-Adaptive Low-Rank KVCache Approximation with Reconstruction Elimination - TACL 2025

Keshav Huang, Tara Venkatesh, Utkarsh Dingankar, Antonio Mallia, Daniel Campos, Jimmy Jiao, Christopher Potts, Omar Khattab - ColBERT-serve: Efficient Multi-Stage Memory-Mapped Scoring - ECIR 2025

Michael J Ryan, Danmei Xu, Chris Nivera, Daniel Campos - EnronQA: Towards Personalized RAG over Private Documents

2024

Gabriele Oliaro, Zhihao Jia, Daniel Campos, Aurick Qiao - SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications - NeurIPS 2025

Sahaj Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Jimmy Lin - A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look

Ronak Pradeep, Nandan Thakur, Sahel Sharifymoghaddam, Eric Zhang, Ryan Nguyen, Daniel Campos, Nick Craswell, Jimmy Lin - Ragnarok: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track - ECIR 2025

Hossein A. Rahmani, Nick Craswell, Emine Yilmaz, Bhaskar Mitra, Daniel Campos - Synthetic Test Collections for Retrieval Evaluation - SIGIR 2024

Luke Merrick, Danmei Xu, Gaurav Nuti, Daniel Campos - Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models

Puxuan Yu, Luke Merrick, Gaurav Nuti, Daniel Campos - Arctic-Embed 2.0: Multilingual Retrieval Without Compromise

2023

Daniel Campos, Surya Kallumadi, Corby Rosset, Cheng Xiang Zhai, Alessandro Magnani - Overview of the TREC 2023 Product Product Search Track - TREC 2023

EFFICIENT AND ROBUST WEB SCALE LANGUAGE MODEL BASED RETRIEVAL, GENERATION, AND UNDERSTANDING - University of Illinois Urbana-Champaign Computer Science Doctoral Thesis

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Jimmy Lin - Overview of the TREC 2022 Deep Learning Track - TREC 2022

Daniel Campos, ChengXiang Zhai - To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency - SustaiNLP 2023 @ ACL 2023

Daniel Campos, Alessandro Magnani, ChengXiang Zhai - Quick Dense Retrievers Consume KALE: Post Training Kullback Leibler Alignment of Embeddings for Asymmetrical Dual Encoders - SustaiNLP 2023 @ ACL 2023

Daniel Campos, ChengXiang Zhai - Dense Sparse Retrieval: Using Sparse Language Models for Inference Efficient Dense Retrieval

Daniel Campos, Alessandro Magnani, ChengXiang Zhai - Noise-Robust Dense Retrieval via Contrastive Alignment Post Training (CAPOT)

Daniel Campos, Alexandre Marques, Tuan Nguyen, Mark Kurtz, ChengXiang Zhai - oBERTa: Improving Sparse Transfer Learning via Improved Initialization, Distillation, and Pruning Regimes - SustaiNLP 2023 @ ACL 2023

Daniel Campos, Daniel Perry, Samir Joshi, Yashmeet Gambhir, Wei Du, Zhengzheng Xing, Aaron Colak - Compressing Cross-Lingual Multi-task Models at Qualtrics - IAAI-23

2022

Eldar Kurtic, Daniel Campos, Tuan Nguyen, Elias Frantar, Mark Kurtz, Benjamin Fineran, Michael Goin, Dan Alistarh - The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models - EMNLP 2022

Daniel Campos, Alexandre Marques, Tuan Nguyen, Mark Kurtz, ChengXiang Zhai - Sparse*BERT: Sparse Models are Robust - Sparsity in Neural Networks Workshop @ ICML 2022

Jimmy Lin, Daniel Campos, Nick Craswell, Bhaskar Mitra, Emine Yilmaz - Fostering Coopetition While Plugging Leaks: The Design and Implementation of the MS MARCO Leaderboards - SIGIR 2022

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Jimmy Lin - Overview of the TREC 2021 Deep Learning Track - TREC 2021

2021

Daniel Campos, Heng Ji - IMG2SMI: Translating Molecular Structure Images to Simplified Molecular-input Line-entry System

Daniel Campos - Curriculum Learning for Language Modeling

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen Voorhees and Ian Soboroff - TREC Deep Learning Track: Reusable Test Collections in the Large Data Regime - SIGIR 2021

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin - MS MARCO: Benchmarking Ranking Models in the Large-Data Regime - SIGIR 2021

Jimmy Lin, Daniel Campos, Nick Craswell, Bhaskar Mitra, Emine Yilmaz - Significant Improvements over the State of the Art? A Case Study of the MS MARCO Document Ranking Leaderboard - SIGIR 2021

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos - Overview of the TREC 2020 Deep Learning Track - TREC 2020

2020

Explorations In Curriculum Learning Methods For Training Language Models - University of Washington Computational Linguistics Master's Thesis

Nick Craswell, Daniel Campos, Bhaskar Mitra, Emine Yilmaz, Bodo Billerbeck - ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search - CIKM 2020

Yaobo Liang, Nan Duan, et al., Daniel Campos, Rangan Majumder, Ming Zhou - XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation - EMNLP 2020

Emine Yilmaz, Nick Craswell, Bhaskar Mitra and Daniel Campos - On the Reliability of Test Collections to Evaluating Systems of Different Types - SIGIR 2020

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees - Overview of the TREC 2019 Deep Learning Track - TREC 2019

Corbin Rosset, Chenyan Xiong, Xia Song, Daniel Campos, Nick Craswell, Saurabh Tiwary and Paul Bennett - Leading Conversational Search by Suggesting Useful Questions - WWW 2020

Manling Li, Ying Lin, et al., Daniel Campos, Heng Ji, et al. - GAIA at SM-KBP 2020 - TAC 2020

2019

Lee Xiong, Chuan Hu, Chenyan Xiong, Daniel Campos, Arnold Overwijk and Xiayu Huang - Open Domain Web Keyphrase Extraction Beyond Language Modeling - EMNLP 2019

2018

Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang - MS MARCO: A Human Generated MAchine Reading COmprehension Dataset - Website, Github

2015

Daniel Campos, Zoe Konrad - Experiments in Inferring Social Networks of Diffusion

Education

University of Illinois Urbana-Champaign (UIUC) - PhD Computer Science 2023. Thesis: Efficient and Robust Web Scale Language Model Based Retrieval, Generation, and Understanding. Advisor: ChengXiang Zhai.

University of Washington - MS Computational Linguistics 2020. Thesis: Explorations in Curriculum Learning Methods for Training Language Models.

Rensselaer Polytechnic Institute - BS Computer Science 2014

Experience

Founder & CEO - Zipf AI - Oct 2025–Present

Senior Research Scientist, Tech Lead - Snowflake - May 2023–Oct 2025

Senior Research Scientist - Neeva (acquired by Snowflake) - Dec 2022–May 2023

Applied Scientist Consultant - Walmart Labs - June 2022–Dec 2022

Applied Scientist Consultant - Qualtrics - March 2022–June 2022

Research Scientist Consultant - Mendel AI - Oct 2021–March 2022

Research Scientist Consultant - Neural Magic - Oct 2020–March 2023

Teaching Assistant - UIUC (CS 510, CS 410, CS 124) - Jan 2021–May 2023

Research Assistant - UIUC Blender Lab - June 2020–Dec 2021

Senior PM / Applied Scientist - Microsoft Research & AI, Bing - Aug 2015–Oct 2020

Awards & Fellowships

Ripple X Fellow (2022)

Z Fellow (2022)

Gene Golub Fellowship at UIUC (2020–2021)

UIUC Summer Predoctoral Institute Fellow (2020)

RPI Business Model Competition 1st Place (2014)

Harvard iLab Cultural Entrepreneurship Challenge Finalist (2014)

Patents

Enhanced Searching Using Fine-Tuned Machine Learning Models - U.S. Patent 12,314,318 - Granted 2025

Enhanced Search Result Generation Using Multi-Document Summarization - U.S. Patent 12,561,375 - Granted 2026

Using a Multi-Task-Trained Neural Network to Guide Interaction with a Query-Processing System via Useful Suggestions - U.S. Patent 11,853,362 - Granted 2023

Keyphrase Extraction Beyond Language Modeling - U.S. Patent 11,657,223 - Granted 2023

Executing Queries with Hallucination Safeguards - U.S. Patent App. 19/034,022 - Filed 2026

Community & Teaching

NIST TREC RAG Track Co-organizer (2024–2026)

NIST TREC Product Search Track Principal Coordinator (2023–2025)

NIST TREC Deep Learning Track Co-organizer (2018–2023)

ACM SIGIR/SIGKDD Africa Summer School Invited Lecturer (2019, 2020)

Invited Talk: Benchmarking End to End Product Retrieval - SIGIR eCommerce Workshop 2023

Invited Talk: Making LLM Inference Affordable - LLMs in Production Conference 2023

Invited Lecture on Unstructured Pruning - UT Austin VITA Lab

Teaching Assistant & Guest Lecturer - UIUC CS 510: Advanced Information Retrieval & CS 410: Text Information Systems (2021–2023)