Ilia Karmanov
Senior Staff Research Scientist at NVIDIA ADLR · Zurich
I studied economics at LSE (BSc, First Class; MSc), transitioned into machine learning in 2016, and have since worked at Microsoft, Qualcomm AI Research, and NVIDIA.
At NVIDIA, I work on vision-language models and reinforcement learning for multimodal reasoning. I first-authored Eclair, a document digitization model used in the Nemotron-H pre-training pipeline, and have contributed to Eagle 2, Nemotron Nano V2 VL, and Nemotron Parse (deployed with open weights).
At Qualcomm AI Research (2020–2022), I worked on 3D computer vision and efficient architectures, publishing at NeurIPS (with Max Welling), ICCV, and BMVC, and filing 12 patent applications. At Microsoft (2016–2020), I worked on applied ML and initiated an open-source DL benchmarking project (1,700+ stars).
My MSc thesis used optimal control theory to model corporate behaviour under reputational incentives. I then worked as a research economist at an Oxford research centre (directed by Prof. Paul Collier), and as a research assistant to Prof. Frank Cowell at LSE on causal inference work that led to a published paper.
Research interests: multimodal reasoning, reinforcement learning for vision-language models, model evaluation, document understanding.
Selected Research
Eclair
Document digitization model for extracting formatted text, layout, and reading order. Shipped as Nemotron Parse; provided 50B+ tokens to NVIDIA's LLM pre-training pipeline.
arXiv 2025 First AuthorWiCluster
First weakly-supervised passive indoor positioning using WiFi CSI without precise location labels.
IEEE GLOBECOM 2021 BMVCSingle-gated MoE
Revisiting simple MoE architectures with base model branch for early-exit and regularization.
BMVC 2022Recent News
- 11/2025 Nemotron Parse 1.1 released — lightweight document parsing model with open weights and optimized NIM container.
- 11/2025 Contributed to Nemotron Nano V2 VL, a vision-language model for document understanding and long video comprehension.
- 07/2025 Contributed to NVIDIA Developer Blog on turning complex documents into usable data with Nemotron Parse 1.1.
- 04/2025 Contributed to Nemotron-H — hybrid Mamba-Transformer models with up to 3x faster inference. Eclair was used for pre-training data preparation.
- 02/2025 Released Eclair (first-author) — document digitization model for extracting content, layout, and reading order.
- 01/2025 Contributed to Eagle 2. Eagle2-9B matches 70B+ models and powers the GR00T-N1 robotic foundation model.
- 09/2022 Joined NVIDIA Research in Zurich, working on document understanding and vision-language models.
- 06/2022 Revisiting Single-gated Mixtures of Experts accepted to BMVC 2022.
- 05/2022 WiCluster featured by Qualcomm as part of AI research crossing over to wireless communication.
- 03/2022 3D positioning work featured as a Qualcomm AI First and covered by Forbes.