QualificationBachelor’s or Master’s degree in a relevant quantitative discipline
Key Responsibilities
Pipeline Engineering: Build, optimize, and support enterprise-level batch and streaming data pipelines on the modern Lakehouse platform. Refactor and modernize legacy data flows to enhance operational reliability and maintainability.
Data Modeling & Curation: Construct raw, refined, and highly curated datasets that directly support firm-wide analytics, regulatory reporting, and AI components. Apply temporal data modeling principles to track historical state changes accurately.
Data Quality & Reconciliation: Code and integrate operational controls to validate completeness, accuracy, and absolute consistency across data domains. Build programmatic reconciliation workflows to detect and resolve processing breaks rapidly.
Shared Framework Support: Build reusable developer tooling and utility components to optimize platform deployment workflows and lower operational latency.
Stakeholder Collaboration: Collaborate closely with internal downstream consumers and product managers to map dependencies, establish implementation standards, and document core data products clearly.
Skills & Eligibility
Experience: 1+ years of proven hands-on experience setting up, maintaining, or supporting production data pipelines in a collaborative engineering setup.
Education: Bachelor’s or Master’s degree in Computer Science, Software Engineering, Mathematics, or a highly quantitative discipline with strong analytical foundations.
Programming Mastery: Robust hands-on coding proficiency in either Python or Java.
Database & SQL Expertise: Excellent working knowledge of SQL, including deep query performance tuning, optimization, and complex diagnostic analysis.
Distributed Computing: Experience processing massive datasets utilizing distributed frameworks such as Apache Spark or messaging protocols like Kafka.
Data Serialization Standards: Comprehensive understanding of common file storage layouts and serialization formats, including JSON, Avro, and Parquet.
Data Engineering Foundations: Clear baseline grasp of structural schema design, partitioning, clustering, schema evolution rules, and historical delta models.
Software Core Discipline: Familiarity with version control frameworks (Git), automated application testing systems, release governance, and continuous integration/continuous deployment (CI/CD) pipelines.
Note: This job is posted on external sites. Joblit shares the listing for convenience and does not take responsibility for third-party content.