Full list of publications. See also my Google Scholar profile.
2025
Nemotron Parse 1.1
arXiv 2025
Follow-up to Eclair. 885M parameter lightweight model adding a token-compressed variant (20% speed gain), improved reading order for floating elements, and longer output sequences. Released as open weights with optimized NIM container.
NVIDIA Nemotron Nano V2 VL
arXiv 2025
Vision-language model on hybrid Mamba-Transformer architecture for document understanding, long video comprehension, and reasoning. 128K token context with token reduction for higher throughput.
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
arXiv 2025
8B and 56B hybrid Mamba-Transformer models with up to 3x faster inference than comparable Transformers (Qwen-2.5, Llama-3.1) with equal or better accuracy. Eclair was used for PDF-to-text extraction in the pre-training data pipeline.
Eclair: Extracting Content and Layout with Integrated Reading Order for Documents
arXiv 2025
Multimodal encoder-decoder for document understanding. Extracts formatted text (markdown/LaTeX), bounding boxes with semantic classes, and reading order. Originated architectural choices: no positional encoding in the decoder, chained multi-token prediction. Used across NVIDIA's training pipelines for LLM pre-training data, VLM distillation, pseudo-labeling, and synthetic VQA grounding. Introduces the DROBS benchmark.
Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models
NeurIPS 2025
Data-centric approach to VLM post-training. Eagle2-9B matches models with up to 70B parameters. Later adopted as the VLM backbone of NVIDIA's GR00T-N1 robotic foundation model.
2022
Revisiting Single-gated Mixtures of Experts
BMVC 2022
Revisits simple single-gate MoE architectures with base model branch for early-exit and regularization. Achieves efficiency-accuracy trade-offs comparable to more complex MoE approaches.
2021
Modality-Agnostic Topology Aware Localization
NeurIPS 2021
Unsupervised positioning using optimal transport on isometric embeddings, agnostic to input modality. Applied to WiFi and visual positioning.
Deep Learning Frameworks for Weakly-Supervised Indoor Localization
NeurIPS 2021 (Competition & Demos)
Deep learning frameworks for weakly-supervised indoor positioning using WiFi and visual data.
WiCluster: Passive Indoor 2D/3D Positioning using WiFi without Precise Labels
IEEE GLOBECOM 2021
First weakly-supervised passive indoor positioning using WiFi CSI without precise location labels. Featured by Qualcomm as an AI First and covered by Forbes.
Motion-Augmented Self-Training for Video Recognition at Smaller Scale
ICCV 2021
Self-training approach for video recognition that leverages motion information to improve performance with limited labeled data.
Hand Gesture Recognition using 802.11ad mmWave Sensor in the Mobile Device
IEEE WCNC 2021
Hand gesture recognition using mmWave radar sensing on mobile devices.
2015
European Identity and Redistributive Preferences
CESifo Working Paper / LSE
Empirical causal inference (diff-in-diff) examining how changes in European identity affect preferences for redistribution. Contributed data generation, simulations, and econometric analysis.