Responsibilities
- Advises on Google toolset for data engineering
- Develops data engineering solutions on Google Cloud ecosystem
- Supports and maintains data engineering solutions on Google Cloud ecosystem
- Designs, builds and operationalises batch and real-time data pipelines using Google Cloud services – Google DataProc, DataFlow and PubSub
- Designs, builds and operationalises data layer on BigQuery, BigTable, Cloud Spanner, CloudSQL and AlloyDB
- Designs, builds data migration scripts and migrates data using Google Data Migration Services
Technical Skills
- Proficient with Google Data Platform components – Google Cloud Storage, BigTable, BigQuery, DataProc with Spark and Hadoop, DataFlow with Apache Beam / Python
- Proficient with Google PubSub and Managed Streaming for Apache Kafka
- Comfortable using open source technologies like Apache Airflow and dbt, Spark / Python or Spark / Scala
- Experience in developing batch and real-time data pipelines for Data Warehouse and Data Lake
- Experience in scheduling and managing data platform using Google Cloud Scheduler, Cloud Composer (Airflow)
- Solid experience in Object Oriented Programming using Python
- Good knowledge of data structures and algorithms
- Solid background in data engineering – PySpark, Big Data, Hive, SQL, Kafka
- Must have experience in handling real-time and batch ingestion
- Should be able to optimise Spark job performance and debug production job failures
- Hands-on experience on cloud platform, preferably GCP
- Good to have experience building REST APIs
Additional Skills
- Proficiency in GCP services including BigQuery, Cloud Storage, Dataflow, DataProc, and Data Mesh
- Strong in Hive, Hadoop, PySpark, Scala and other Big Data technologies
- Google Cloud, Data Warehouse, BigQuery, BigQueryML, SQL
- Good to have: Azure Data Warehouse
- PySpark, Big Data fundamentals
