Ilia Karmanov
Senior Staff Research Scientist at NVIDIA ADLR · Zurich
I studied economics at LSE (BSc and MSc), transitioned into machine learning in 2016, and have since worked at Microsoft, Qualcomm AI Research, and NVIDIA.
At NVIDIA, I work on post-training and evaluation of vision-language models for multimodal reasoning. I first-authored Eclair, a document-understanding model used across NVIDIA's pre-training pipelines (including Nemotron-H), and have contributed to Eagle 2, Nemotron Nano V2 VL and Nemotron 3 Nano Omni, working on GRPO post-training and on-policy distillation. Eclair shipped as Nemotron Parse with open weights.
At Qualcomm AI Research (2020–2022), I worked on 3D computer vision and efficient architectures, publishing at NeurIPS (with Max Welling), ICCV, and BMVC, and filing 12 patent applications. At Microsoft (2016–2020), I worked on applied ML and initiated an open-source DL benchmarking project (1,700+ stars).
My MSc thesis used optimal control theory to model corporate behaviour under reputational incentives. I then worked as a research economist at an Oxford research centre (directed by Prof. Paul Collier), and as a research assistant to Prof. Frank Cowell at LSE on causal inference work that led to a published paper.
Research interests: multimodal reasoning, post-training (RL and distillation) for vision-language models, long-context and document understanding, model evaluation.
Selected Research
Eclair
Document-understanding model extracting formatted text, tables, and reading order. Shipped as Nemotron Parse; underpinned NVIDIA’s Llama Nemotron Nano VL, which ranked #1 on OCRBench v2.
arXiv 2025 NVIDIANemotron VLMs
Vision-language and omni-modal models (Nano V2 VL and Nemotron 3 Nano Omni) for document understanding, long video, and reasoning. I work on their post-training (GRPO, on-policy distillation) and long-context evaluation.
arXiv 2025–2026 BMVCSingle-gated MoE
Revisiting simple MoE architectures with base model branch for early-exit and regularization.
BMVC 2022Recent News
- 04/2026 New model out, Nemotron 3 Nano Omni, open and multimodal across text, image, video, and audio, with strong long-context document understanding.
- 11/2025 Nemotron Parse 1.1 is on NVIDIA's build platform, a lightweight open-weights document parser.
- 11/2025 Nemotron Nano V2 VL, a vision-language model for document understanding and long video comprehension.
- 07/2025 NVIDIA Developer Blog on turning complex documents into usable data with Nemotron Parse.
- 04/2025 Nemotron-H, hybrid Mamba-Transformer models with up to 3x faster inference, using Eclair-extracted data in pre-training.
- 02/2025 Released Eclair (first author), a document-understanding model for content, layout, and reading order.
- 01/2025 Eagle 2 at NeurIPS 2025; Eagle2-9B matches 70B+ models and backs the GR00T-N1 robot foundation model.
- 09/2022 Joined NVIDIA Research in Zurich, working on document understanding and vision-language models.
- 06/2022 Revisiting Single-gated Mixtures of Experts accepted to BMVC 2022.
- 05/2022 WiCluster featured in Qualcomm's research on AI for wireless sensing.