Data Engineer

General Motors • Full-time • 1d ago

The Role:

Join a team of builders shaping enterprise-grade data products and platforms that power analytics, customer experiences, and operational insights at scale.

You will design, build, and operate reliable batch and streaming data pipelines, partnering closely with product, platform, and governance teams to deliver high-quality, secure, and discoverable data.

What You'll Do (Responsibilities):

Architect and implement scalable ETL/ELT pipelines and services using modern data platforms and best practices.
Build streaming and micro-batch data flows, including schema evolution, late/out‑of‑order events handling, and exactly‑once delivery semantics where feasible.
Model data for analytics and ML using layered “bronze/silver/gold” patterns, with clear data contracts, SLAs, and lineage.
Embed observability (logging, metrics, tracing), data quality checks, and cost/performance optimization into everything you ship.
Automate testing and deployments with CI/CD.
Collaborate with domain SMEs and data product owners to define requirements, acceptance criteria, and success metrics.
Operate what you build: participate in on‑call/incident response rotations and drive RCA and preventative engineering.

How You'll Work:

Product mindset: outcome-driven, iterative delivery, and clear metrics.
Quality first: automated tests, reproducible pipelines, and continuous improvement.
Security and compliance by design: least-privilege access, data masking, and auditability.
Collaboration: partner across platforms, governance, and product teams; communicate clearly with technical and non-technical stakeholders.

Tools you May Use:

Languages: Python, SQL
Compute and pipelines: Apache Spark, orchestration/workflows (e.g., Databricks Workflows/Airflow), containerized jobs where needed
Storage/metadata: Parquet; lakehouse tables (e.g., Delta/Iceberg); catalog/lineage tools
DevOps: Git, CI/CD, secrets management, observability (logs/metrics/traces)

Your Skills & Abilities (Required Qualifications):

Bachelor's in Computer Science, Engineering, or equivalent experience will be considered in lieu of degree
5+ years building data pipelines at scale with a modern data stack
Strong proficiency in Python and SQL, plus performance tuning of both
Hands-on experience with distributed compute (e.g., Apache Spark) and lakehouse/warehouse paradigms
Data modeling for analytics (dimensional/medallion), data contracts, and schema management
CI/CD (Git-based workflows) and infrastructure-as-code (e.g., Terraform) in a cloud environment
Practical knowledge of data security, privacy, and access control concepts

What Will Give You A Competitive Edge (Preferred Qualifications):

Streaming pipelines with technologies such as Kafka (or similar), including stateful processing and backpressure management
Lakehouse technologies (e.g., Delta Lake/Iceberg/Hudi), file formats (Parquet/ORC), and table optimization (Z‑ordering, clustering)
Data governance and cataloging (e.g., Atlan/Unity Catalog/Collibra/Immuta) and automated lineage
Data quality frameworks (e.g., Great Expectations) and SLAs/SLOs for data products
Experience with Databricks or equivalent cloud data platforms and workload orchestration
Domain experience with IoT/telematics, energy, or mobility data is a plus

#LI-DH2

Test Your Formula 1 Career Readiness

Take the Formula 1 Career Readiness Quiz and find out if you’re on track to success. Get a personalised report highlighting your strengths and areas to improve in just 5 minutes!

Get Your Bespoke Report