Ilia Karmanov

Senior Staff Research Scientist at NVIDIA ADLR · Zurich

I studied economics at LSE (BSc and MSc), transitioned into machine learning in 2016, and have since worked at Microsoft, Qualcomm AI Research, and NVIDIA.

At NVIDIA, I work across the architecture, pre-training, post-training, and evaluation of vision-language models for multimodal reasoning. I first-authored Eclair, a document-understanding model used across NVIDIA's pre-training pipelines (including Nemotron-H), and have contributed to Eagle 2, Nemotron Nano V2 VL and Nemotron 3 Nano Omni, working on GRPO post-training and on-policy distillation. Eclair shipped as Nemotron Parse with open weights.

Currently I work on multi-teacher on-policy distillation, building on on-policy distillation work in our group: whether the transfer of answer-correctness can be separated from the transfer of reasoning style, so a higher-capacity domain teacher can train a smaller student without forcing a reasoning style the student cannot support.

At Qualcomm AI Research (2020–2022), I worked on 3D computer vision and efficient architectures, publishing at NeurIPS (with Max Welling), ICCV, and BMVC, and filing 12 patent applications. At Microsoft (2016–2020), I worked on applied ML and initiated an open-source DL benchmarking project (1,700+ stars).

My MSc thesis used optimal control theory to model corporate behaviour under reputational incentives. I then worked as a research economist at an Oxford research centre (directed by Prof. Paul Collier), and as a research assistant to Prof. Frank Cowell at LSE on causal inference work that led to a published paper.

Research interests: multimodal model design and architecture, pre-training and post-training (synthetic data, SFT, RL, and distillation) for vision-language models, long-context and document understanding, model evaluation.

Selected Research

First Author

Ilia Karmanov

Selected Research

Eclair

Nemotron VLMs

Single-gated MoE

Recent News