The Role:
Join a team of builders shaping enterprise-grade data products and platforms that power analytics, customer experiences, and operational insights at scale.
You will design, build, and operate reliable batch and streaming data pipelines, partnering closely with product, platform, and governance teams to deliver high-quality, secure, and discoverable data.
What You'll Do (Responsibilities):
- Architect and implement scalable ETL/ELT pipelines and services using modern data platforms and best practices.
- Build streaming and micro-batch data flows, including schema evolution, late/out‑of‑order events handling, and exactly‑once delivery semantics where feasible.
- Model data for analytics and ML using layered “bronze/silver/gold” patterns, with clear data contracts, SLAs, and lineage.
- Embed observability (logging, metrics, tracing), data quality checks, and cost/performance optimization into everything you ship.
- Automate testing and deployments with CI/CD.
- Collaborate with domain SMEs and data product owners to define requirements, acceptance criteria, and success metrics.
- Operate what you build: participate in on‑call/incident response rotations and drive RCA and preventative engineering.
How You'll Work:
- Product mindset: outcome-driven, iterative delivery, and clear metrics.
- Quality first: automated tests, reproducible pipelines, and continuous improvement.
- Security and compliance by design: least-privilege access, data masking, and auditability.
- Collaboration: partner across platforms, governance, and product teams; communicate clearly with technical and non-technical stakeholders.
Tools you May Use:
- Languages: Python, SQL
- Compute and pipelines: Apache Spark, orchestration/workflows (e.g., Databricks Workflows/Airflow), containerized jobs where needed
- Storage/metadata: Parquet; lakehouse tables (e.g., Delta/Iceberg); catalog/lineage tools
- DevOps: Git, CI/CD, secrets management, observability (logs/metrics/traces)
Your Skills & Abilities (Required Qualifications):
- Bachelor's in Computer Science, Engineering, or equivalent experience will be considered in lieu of degree
- 5+ years building data pipelines at scale with a modern data stack
- Strong proficiency in Python and SQL, plus performance tuning of both
- Hands-on experience with distributed compute (e.g., Apache Spark) and lakehouse/warehouse paradigms
- Data modeling for analytics (dimensional/medallion), data contracts, and schema management
- CI/CD (Git-based workflows) and infrastructure-as-code (e.g., Terraform) in a cloud environment
- Practical knowledge of data security, privacy, and access control concepts
What Will Give You A Competitive Edge (Preferred Qualifications):
- Streaming pipelines with technologies such as Kafka (or similar), including stateful processing and backpressure management
- Lakehouse technologies (e.g., Delta Lake/Iceberg/Hudi), file formats (Parquet/ORC), and table optimization (Z‑ordering, clustering)
- Data governance and cataloging (e.g., Atlan/Unity Catalog/Collibra/Immuta) and automated lineage
- Data quality frameworks (e.g., Great Expectations) and SLAs/SLOs for data products
- Experience with Databricks or equivalent cloud data platforms and workload orchestration
- Domain experience with IoT/telematics, energy, or mobility data is a plus