Selected research projects and open-source contributions.
Eclair
First-authored multimodal encoder-decoder for document understanding. Originated the architectural choices (no positional encoding in the decoder, chained multi-token prediction). Used across NVIDIA's training pipelines: 50B+ tokens to LLM pre-training, teacher for VLM OCR distillation, pseudo-labeler for VLM training data, and text extractor grounding synthetic VQA generation.
NVIDIA Nemotron Parse
Senior author on the Nemotron Parse 1.1 follow-up to Eclair. Open weights, optimized NIM container, deployed via NVIDIA's build platform. v1.1 adds a token-compressed variant with 20% speed gain, improved reading order, and longer output sequences.
Eagle 2 (NVlabs)
Contributed to NVIDIA's vision-language model. Eagle2-9B matches 70B+ parameter models. Serves as the VLM backbone of the GR00T-N1 robotic foundation model.
Lung Disease Prediction from Chest X-Rays
Co-authored work on lung disease prediction using DenseNet-121 on the NIH Chest X-ray dataset (112K images, 14 pathologies). Published on the Microsoft ML Blog.
Deep Learning Framework Comparisons
Benchmarking deep learning frameworks (TensorFlow, PyTorch, MXNet, CNTK, Keras, etc.) on common architectures. 1,700+ GitHub stars, contributions from framework creators.