Ilia Karmanov

Ilia Karmanov

Senior Staff Research Scientist at NVIDIA ADLR · Zurich

I studied economics at LSE (BSc and MSc), transitioned into machine learning in 2016, and have since worked at Microsoft, Qualcomm AI Research, and NVIDIA.

At NVIDIA, I work on post-training and evaluation of vision-language models for multimodal reasoning. I first-authored Eclair, a document-understanding model used across NVIDIA's pre-training pipelines (including Nemotron-H), and have contributed to Eagle 2, Nemotron Nano V2 VL and Nemotron 3 Nano Omni, working on GRPO post-training and on-policy distillation. Eclair shipped as Nemotron Parse with open weights.

At Qualcomm AI Research (2020–2022), I worked on 3D computer vision and efficient architectures, publishing at NeurIPS (with Max Welling), ICCV, and BMVC, and filing 12 patent applications. At Microsoft (2016–2020), I worked on applied ML and initiated an open-source DL benchmarking project (1,700+ stars).

My MSc thesis used optimal control theory to model corporate behaviour under reputational incentives. I then worked as a research economist at an Oxford research centre (directed by Prof. Paul Collier), and as a research assistant to Prof. Frank Cowell at LSE on causal inference work that led to a published paper.

Research interests: multimodal reasoning, post-training (RL and distillation) for vision-language models, long-context and document understanding, model evaluation.

Recent News

  • 04/2026 New model out, Nemotron 3 Nano Omni, open and multimodal across text, image, video, and audio, with strong long-context document understanding.
  • 11/2025 Nemotron Parse 1.1 is on NVIDIA's build platform, a lightweight open-weights document parser.
  • 11/2025 Nemotron Nano V2 VL, a vision-language model for document understanding and long video comprehension.
  • 07/2025 NVIDIA Developer Blog on turning complex documents into usable data with Nemotron Parse.
  • 04/2025 Nemotron-H, hybrid Mamba-Transformer models with up to 3x faster inference, using Eclair-extracted data in pre-training.
  • 02/2025 Released Eclair (first author), a document-understanding model for content, layout, and reading order.
  • 01/2025 Eagle 2 at NeurIPS 2025; Eagle2-9B matches 70B+ models and backs the GR00T-N1 robot foundation model.
  • 09/2022 Joined NVIDIA Research in Zurich, working on document understanding and vision-language models.
  • 06/2022 Revisiting Single-gated Mixtures of Experts accepted to BMVC 2022.
  • 05/2022 WiCluster featured in Qualcomm's research on AI for wireless sensing.