Ilia Karmanov

Ilia Karmanov

Senior Staff Research Scientist at NVIDIA ADLR · Zurich

I studied economics at LSE (BSc, First Class; MSc), transitioned into machine learning in 2016, and have since worked at Microsoft, Qualcomm AI Research, and NVIDIA.

At NVIDIA, I work on vision-language models and reinforcement learning for multimodal reasoning. I first-authored Eclair, a document digitization model used in the Nemotron-H pre-training pipeline, and have contributed to Eagle 2, Nemotron Nano V2 VL, and Nemotron Parse (deployed with open weights).

At Qualcomm AI Research (2020–2022), I worked on 3D computer vision and efficient architectures, publishing at NeurIPS (with Max Welling), ICCV, and BMVC, and filing 12 patent applications. At Microsoft (2016–2020), I worked on applied ML and initiated an open-source DL benchmarking project (1,700+ stars).

My MSc thesis used optimal control theory to model corporate behaviour under reputational incentives. I then worked as a research economist at an Oxford research centre (directed by Prof. Paul Collier), and as a research assistant to Prof. Frank Cowell at LSE on causal inference work that led to a published paper.

Research interests: multimodal reasoning, reinforcement learning for vision-language models, model evaluation, document understanding.

Recent News

  • 11/2025 Nemotron Parse 1.1 released — lightweight document parsing model with open weights and optimized NIM container.
  • 11/2025 Contributed to Nemotron Nano V2 VL, a vision-language model for document understanding and long video comprehension.
  • 07/2025 Contributed to NVIDIA Developer Blog on turning complex documents into usable data with Nemotron Parse 1.1.
  • 04/2025 Contributed to Nemotron-H — hybrid Mamba-Transformer models with up to 3x faster inference. Eclair was used for pre-training data preparation.
  • 02/2025 Released Eclair (first-author) — document digitization model for extracting content, layout, and reading order.
  • 01/2025 Contributed to Eagle 2. Eagle2-9B matches 70B+ models and powers the GR00T-N1 robotic foundation model.
  • 09/2022 Joined NVIDIA Research in Zurich, working on document understanding and vision-language models.
  • 06/2022 Revisiting Single-gated Mixtures of Experts accepted to BMVC 2022.
  • 05/2022 WiCluster featured by Qualcomm as part of AI research crossing over to wireless communication.
  • 03/2022 3D positioning work featured as a Qualcomm AI First and covered by Forbes.