Projects

Eclair

First-authored multimodal encoder-decoder for document understanding. Originated the architectural choices (no positional encoding in the decoder, chained multi-token prediction). Used across NVIDIA's training pipelines: 50B+ tokens to LLM pre-training, teacher for VLM OCR distillation, pseudo-labeler for VLM training data, and text extractor grounding synthetic VQA generation.

Document AI Vision-Language Models Multimodal Pre-training

NVIDIA Nemotron Parse

Senior author on the Nemotron Parse 1.1 follow-up to Eclair. Open weights, optimized NIM container, deployed via NVIDIA's build platform. v1.1 adds a token-compressed variant with 20% speed gain, improved reading order, and longer output sequences.

Document Parsing Production ML Model Deployment

Eagle 2 (NVlabs)

Contributed to NVIDIA's vision-language model. Eagle2-9B matches 70B+ parameter models. Serves as the VLM backbone of the GR00T-N1 robotic foundation model.

Vision-Language Models Foundation Models

Lung Disease Prediction from Chest X-Rays

Co-authored work on lung disease prediction using DenseNet-121 on the NIH Chest X-ray dataset (112K images, 14 pathologies). Published on the Microsoft ML Blog.

Medical Imaging Computer Vision

Deep Learning Framework Comparisons

Benchmarking deep learning frameworks (TensorFlow, PyTorch, MXNet, CNTK, Keras, etc.) on common architectures. 1,700+ GitHub stars, contributions from framework creators.

Open Source Deep Learning Benchmarking