Selected research projects and open-source contributions.

Eclair

First author of Eclair, a document-understanding model that extracts formatted text, tables, formulas, and reading order. Asymmetric design (larger vision encoder, lightweight decoder), no positional embeddings in the decoder, multi-token prediction, and a single prompt-controlled output format. Adopted across NVIDIA's pipelines for pre-training data (50B+ tokens) and VLM distillation; shipped as Nemotron Parse. Underpinned NVIDIA's Llama Nemotron Nano VL, which ranked first on OCRBench v2.

Document AI Vision-Language Models Multimodal Pre-training

NVIDIA Nemotron Parse

Co-author of Nemotron Parse 1.1, the productized follow-up to my first-authored Eclair. Open weights, optimized NIM container, deployed via NVIDIA's build platform. v1.1 adds a token-compressed variant (20% speed gain), improved reading order, and longer output sequences.

Document Parsing Production ML Model Deployment

NVIDIA Nemotron VLMs

NVIDIA's vision-language and omni-modal models, Nemotron Nano V2 VL and Nemotron 3 Nano Omni, for document understanding, long video, and reasoning. I work on their post-training (GRPO and on-policy distillation) and long-context evaluation (e.g. MMLongBench); the models build on pre-trained backbones.

Vision-Language Models Post-Training Reinforcement Learning

Eagle 2 (NVlabs)

Contributed to the token-compression design for dense OCR. Eagle2-9B matches 70B+ models and is the VLM backbone of NVIDIA's GR00T-N1 robotic foundation model.

Vision-Language Models Foundation Models

Deep Learning Framework Comparisons

Benchmarking deep learning frameworks (TensorFlow, PyTorch, MXNet, CNTK, Keras, etc.) on common architectures. 1,700+ GitHub stars, contributions from framework creators.

Open Source Deep Learning Benchmarking