Hybrid This role is categorized as hybrid. This means the successful candidate is expected to report to the GM Global Technical Center - Cole Engineering Center Podium , MI or Mountain View Technical Center , CA at least three times per week, at minimum or other frequency dictated by the business. This job is eligible for relocation assistance.
About the Team:
The ML Inference Platform is part of the AI Compute Platforms organization within Infrastructure Platforms. Our team owns the cloud-agnostic, reliable, and cost-efficient platform that powers GM’s AI efforts. We’re proud to serve as the AI infrastructure platform for teams developing autonomous vehicles (L3/L4/L5), as well as other groups building AI-driven products for GM and its customers.
We enable rapid innovation and feature development by optimizing for high-priority, ML-centric use cases. Our platform supports the serving of state-of-the-art (SOTA) machine learning models for experimental and bulk inference, with a focus on performance, availability, concurrency, and scalability. We’re committed to maximizing GPU utilization across platforms (B200, H100, A100, and more) while maintaining reliability and cost efficiency.
About the Role:
We are seeking a Staff ML Infrastructure engineer to help build and scale robust Compute platforms for ML workflows. In this role, you’ll work closely with ML engineers and researchers to ensure efficient model serving and inference in production, for their workflows such as data mining, labeling, model distillation, simulations and more. This is a high-impact opportunity to influence the future of AI infrastructure at GM.
You will play a key role in shaping the architecture, roadmap and user-experience of a robust ML inference service supporting real-time, batch, and experimental inference needs. The ideal candidate brings experience in designing distributed systems for ML, strong problem-solving skills, and a product mindset focused on platform usability and reliability.
What you’ll be doing:
- Design and implement core platform backend software components.
- Collaborate with ML engineers and researchers to understand critical workflows, parse them to platform requirements, and deliver incremental value.
- Lead technical decision-making on model serving strategies, orchestration, caching, model versioning, and auto-scaling mechanisms.
- Drive the development of monitoring, observability, and metrics to ensure reliability, performance, and resource optimization of inference services.
- Proactively research and integrate state-of-the-art model serving frameworks, hardware accelerators, and distributed computing techniques.
- Lead large-scale technical initiatives across GM’s ML ecosystem.
- Raise the engineering bar through technical leadership, establishing best practices.
- Contribute to open source projects; represent GM in relevant communities.