M. Usman Rafique

About Usman

I am a Machine Learning Researcher with experience of applying cutting-edge research to solve real-world problems. In my current role at Zoox, I am focused on Data Optimization and Machine Learning for the Behavior Autonomy of our cool robo-taxi.

Previously, I was a Senior Machine Learning Engineer at Bastian Solutions (a Toyota company) from 2023 to 2024, where I developed and deployed state-of-the-art computer vision and machine learning solutions for autonomous pick-and-place robots. Before that, I served as a Senior Research and Development Engineer at Kitware Inc. (2021-2023), tackling diverse computer vision challenges such as change detection from overhead imagery, person identification, novel view synthesis, and atmospheric turbulence correction.

I earned my Ph.D. in Electrical Engineering from the University of Kentucky, with research focused on weakly supervised deep learning methods for image synthesis, semantic segmentation, and change detection.

I’m passionate about staying at the forefront of AI advancements. You can find examples of my work with Large Language Models (LLMs) on my Github, including LLM-Forge, a playground for building practical LLMs with limited compute resources.

My areas of expertise include:

Data-Centric AI for Autonomous Behavior: Architecting data optimization strategies for learned behavior models that control robo-taxi trajectory. My work involves curating large-scale datasets to ensure full coverage of common and edge-case driving scenarios, maximizing model performance while managing data size for efficient, practical training.
Computer Vision for Autonomous Systems: Developing and deploying robust computer vision and ML systems for autonomous robots. This enables complex tasks in dynamic environments, such as object picking, depalletizing, and real-time scene understanding.
Continual Learning: Implementing AI systems that continuously learn and adapt to new data without forgetting previous knowledge.
Production ML: Designing, implementing, and deploying machine learning models for real-time applications in production environments.
Multi-modal Understanding: Combining data from different sources, such as aerial and ground-level imagery, to gain a more comprehensive understanding of a scene.
Vision-Language Models: Integrating computer vision and natural language processing to create AI systems that can understand both images and text.

Academic Background

I completed my PhD at the University of Kentucky, where I was a member of the Multimodal Vision Research Lab. My research focused on combining information from multiple images for scene understanding and image synthesis. My PhD advisors were Dr. Nathan Jacobs and Dr. Samson Cheung

Professional Experience

Senior Machine Learning Engineer, Zoox: Feb 2024 - present
- Developing machine learning solutions for autonomous robo-taxi
Senior Machine Learning Engineer, Bastian Solutions (Toyota): Aug 2023 - Feb 2024
- Developing computer vision and machine learning solutions for autonomous robotic systems.
Senior Research and Development Engineer, Kitware Inc.: Aug 2021 - May 2023
- Conducted research on change detection, person identification, and novel view synthesis.