You will work on infrastructure and knowledge systems that directly enable semiconductor design teams to move faster and smarter. As a Sr. Staff Infrastructure Engineer, you will manage and optimize compute and disk resources across HPC environments to support large-scale simulation workloads, ensuring high availability and performance for critical design tasks.
You will monitor, troubleshoot, and scale compute infrastructure in collaboration with IT and infrastructure teams, diagnosing bottlenecks and resolving issues before they impact engineering timelines. You will develop and implement resource utilization strategies including job scheduling with LSF, load balancing, and storage optimization to maximize throughput and minimize idle time.
You will build custom automation scripts and tools in Python, Tcl, or similar languages to eliminate repetitive manual tasks and integrate seamlessly into existing simulation workflows. You will architect and maintain a structured, AI-ready knowledge database that serves as the single source of truth, connecting specifications, code, tests, and design artifacts so AI tools can reason across domains, not just search flat tables.
You will establish build, validation, and release processes for the knowledge platform with automated checks, versioned builds, staging environments, and rollback plans so teams and AI tools always work against verified, release-ready data. You will design scalable, multi-project infrastructure where core schema, tooling, and validation frameworks are shared while project-specific configurations remain isolated, so onboarding a new product line takes weeks, not months.
As a member of this team, you will be responsible for:
- Managing and optimizing compute and disk resources across HPC environments to support large-scale simulation workloads
- Monitoring, troubleshooting, and scaling compute infrastructure in collaboration with IT and infrastructure teams
- Developing and implementing resource utilization strategies including job scheduling with LSF, load balancing, and storage optimization
- Building custom automation scripts and tools in Python, Tcl, or similar languages to eliminate repetitive manual tasks and integrate seamlessly into existing simulation workflows
- Architecting and maintaining a structured, AI-ready knowledge database that serves as the single source of truth, connecting specifications, code, tests, and design artifacts so AI tools can reason across domains, not just search flat tables
- Establishing build, validation, and release processes for the knowledge platform with automated checks, versioned builds, staging environments, and rollback plans
- Designing scalable, multi-project infrastructure where core schema, tooling, and validation frameworks are shared while project-specific configurations remain isolated
You will be working with a collaborative and dynamic team focused on optimizing compute infrastructure and resource management to support large-scale simulation workloads. Our team values innovation, efficiency, and continuous improvement, working closely with IT and infrastructure teams to scale resources and implement cutting-edge solutions.
XML job scraping automation by YubHub