In this role -
1. You'll lead the data solution, technical design, and development for the company's data platform.
2. Work closely with the product and infrastructure R&D teams to deliver the best data solution and insights over billions of daily events.
3. Design, develop and operate the data pipelines required for optimal extraction, transformation, and loading (ETL) of data from a wide variety of data sources using distributed processing engines such as Apache Spark and Apache Beam.
4. Work closely with Data Science and ML engineers to develop and implement the ML framework and data features.
5. Explore and deploy new data technologies to support the company's data platform.
- At least 2 years of hands-on experience writing complex ETL pipelines using parallel processing engine (Spark/MapReduce/Beam etc) - A Must.
- At least 4 years of hands-on experience in Scala / Java / Python - A Must.
- Expert in writing efficient SQL queries and optimizing performance (Impala/SparkSQL/BigQuery etc) - A Must.
- Excellent understanding of development methodologies, paradigms (OOP, FP) and test-driven environment.
- Experience writing Technical Designs and turning business requirements to technical solutions.
- Experience working on cloud solutions (GCP a big advantage).
- Experience with building Machine Learning solutions.
Big Data Infrastructure Developer
In this role -
1. You'll be responsible for design and development core modules in the company's big data platform infrastructures (hosting in Google Cloud, based on Hadoop ecosystem, Spark Core/Streaming/SQL, Scala, Python, AngularJS, Node.js, Kafka, Impala, Elasticsearch, Google Cloud Machine Learning Engine and TensorFlow).
2. Responsible for production environments and for new developments.
3. Responsible for research, analysis and performing proof of concepts for new technologies, tools and design concepts.
- At least 3-4 years of Back-end Development / Infrastructures experience, working with Java / Scala / Python.
- Strong understanding of software architecture paradigms (OOP, FP) and data structures.
- Experience working on large scale distributed systems and distributed programming.
- Experience working with Linux OS systems and Bash scripting languages.
- Experience with building scalable stream-processing and/or ETL batch processing - using solutions such as Spark/Spark Streaming.
- Experience working with development testing methodologies.
- Strong SQL skills and experience with NoSQL databases (such as HBase, Cassandra, MongoDB and Impala) and relational databases (such as MySQL and SQL Server).
- Experience working with CI/CD tools (Jenkins, Chef etc).
- Experience working with Cloud Service (AWS, GCP, Azure) - an advantage.
- Experience working with Node.js – an advantage.
- Experience working with Docker/Kubernetes Engine – an advantage.