Selected research projects and open-source contributions.
Eclair
First author of Eclair, a document-understanding model that extracts formatted text, tables, formulas, and reading order. Asymmetric design (larger vision encoder, lightweight decoder), no positional embeddings in the decoder, multi-token prediction, and a single prompt-controlled output format. Adopted across NVIDIA's pipelines for pre-training data (50B+ tokens) and VLM distillation; shipped as Nemotron Parse. Underpinned NVIDIA's Llama Nemotron Nano VL, which ranked first on OCRBench v2.
NVIDIA Nemotron Parse
Co-author of Nemotron Parse 1.1, the productized follow-up to my first-authored Eclair. Open weights, optimized NIM container, deployed via NVIDIA's build platform. v1.1 adds a token-compressed variant (20% speed gain), improved reading order, and longer output sequences.
NVIDIA Nemotron VLMs
NVIDIA's vision-language and omni-modal models, Nemotron Nano V2 VL and Nemotron 3 Nano Omni, for document understanding, long video, and reasoning. I work on their post-training (GRPO and on-policy distillation) and long-context evaluation (e.g. MMLongBench); the models build on pre-trained backbones.
Eagle 2 (NVlabs)
Contributed to the token-compression design for dense OCR. Eagle2-9B matches 70B+ models and is the VLM backbone of NVIDIA's GR00T-N1 robotic foundation model.
Deep Learning Framework Comparisons
Benchmarking deep learning frameworks (TensorFlow, PyTorch, MXNet, CNTK, Keras, etc.) on common architectures. 1,700+ GitHub stars, contributions from framework creators.