Synopsys software engineers are key enablers in the world of Electronic Design Automation (EDA), developing and maintaining software used in chip design, verification and manufacturing. They work on assignments like designing, developing, and troubleshooting software, leveraging the state-of-the-art technologies like AI/ML, GenAI and Cloud. Their critical contributions enable world-wide EDA designers to extend the frontiers of semiconductors and chip development.
This role is for a Staff Engineer in R&D Engineering, responsible for performing complex development activities that may require extensive analysis in areas including cloud deployment and maintenance, as well as distributed system maintenance and scaling. The ideal candidate will have built and scaled platforms that actually run in production, not proof-of-concepts that look good in slides. They will know that Kubernetes is powerful, but also that it can become a mess fast if the design decisions are not made with scale and maintainability in mind.
When someone says "we need to deploy this new AI service," the successful candidate will not just spin up a pod and call it done. They will ask about traffic patterns, failure modes, and what happens when this thing needs to scale 10x in six months.
The Staff Engineer will be responsible for:
- Performing complex development activities that may require extensive analysis in areas including cloud deployment and maintenance, as well as distributed system maintenance and scaling
- Using best practices and evangelizing through RFCs and mentoring
- Helping scale our processes (release, development environments, CI/CD pipelines)
- Root cause investigation, automated release testing, production incident solving
- Designing, deploying, and maintaining the platform infrastructure using Kubernetes, Pulumi, and Terraform
- Scaling GPU and CPU compute resources to support AI model training and inference workloads that grow unpredictably
- Building monitoring, alerting, and observability tooling that catches issues before customers do, using tools like Prometheus, Grafana, or equivalent
The impact of this role will be significant, enabling Engineers to deploy models that reduce simulation time from hours to minutes, directly accelerating product innovation for Synopsys customers. The Staff Engineer will also scale the platform to handle growing customer demand without degrading performance or reliability, reduce mean time to recovery during incidents by building better observability and automated remediation into the platform, improve developer velocity by streamlining deployment workflows and eliminating friction in the release process, establish infrastructure patterns and best practices that the broader engineering team can adopt and scale with, prevent outages before they happen by designing resilient, self-healing systems that degrade gracefully under load, and mentor engineers across the organization through RFCs, documentation, and pairing sessions that raise the bar on platform thinking.
To be successful in this role, the candidate will need to have:
- Software development certification
- 3 years' experience, including managing complex platforms that leverage the Kubernetes technology
- Advanced troubleshooting skills
- Distributed systems design and operation experience (Kubernetes, Pulumi, NATS, Redis)
- Proficiency in scripting with Bash and programming in Python or TypeScript
- Comfort working independently and owning problems end to end, from definition through deployment and monitoring
- Ability to explain a complex infrastructure tradeoff to a researcher in two sentences without losing the nuance or talking down to them
- When something breaks in production, ability to stay calm, gather data, and methodically work the problem instead of guessing and restarting things
- Care about the "why" behind a request, if someone asks for a new service, ability to ask what problem they are trying to solve before starting provisioning resources
- Curiosity enough to test new tools and pragmatic enough to know when the old tool is still the right answer
- Ability to work across time zones with a distributed team, which means clear written communication and async collaboration are second nature
XML job scraping automation by YubHub