Individuals within the Data Engineer role ensure that data pipelines are scalable, repeatable, and can serve multiple users. They help facilitate getting data from a variety of different sources, getting it in the right formats, assuring that it adheres to meta data quality standards, and assuring that downstream users can get that data quickly. This role usually functions as a core member of an agile team.
These professionals are responsible for the frameworks and services that makes sure the data on the datalake can easily be:
Queried by means of the data processing framework and our metadata repository and metadata reader service
Filtered on GDPR by our GDPR framework
Transformed as the parquet format through our consumer feed framework
Mapped in a data flow through the lineage service
The Data Engineer is a technical job that requires substantial expertise in a broad range of software development and programming fields. These professionals have knowledge of data analysis, end user requirements analysis, and business requirements analysis to develop a clear understanding of the business needs and to incorporate these needs into technical solutions. They have a solid understanding of physical database design principles, and the system development life cycle. These individuals must work well in a team environment.
Responsibilities
Designing, developing, constructing, testing and maintaining the complete data management & processing systems – Data Pipelines
Aggregate & Transform raw data coming from a variety of data sources to fulfill the functional & non-functional business needs – Data Transformation
Discovering various opportunities for data acquisitions and exploring new ways of using existing data – Data Ingestion
Creating of data models to reduce system complexity and hence increase efficiency and reduce cost – Data Architecture & Models
Performance optimization & monitoring: automating processes, optimizing data delivery & re-designing the complete architecture to improve performance
Proposing ways to improve data quality, reliability & efficiency of the whole system – Data Quality
Creating a solution by integrating a variety of programming languages and tools together – Data Value
As a senior, it is expected to be very well communicative and to be extravert for what concerns solutions (technical-architectural), actively helping and supporting the other team members.
Ideal Profile
Store:
Data Modelling
Data Architecture
Airflow
AWS
Big Data Framework / Hadoop: HDFS, Squid, Spark, Conda, Yarn and MapReduce
NoSQL Databases: Cassandra, Hbase, MongoDB
Access & Transport – Connectivity:
ETL ( Extract, Transform, Load ) : Informatica
Big Data Framework / Hadoop : Flume & Sqoop, Yarn, Zookeeper
Enrich:
Real-time processing framework – Apache Spark
Big Data Framework / Hadoop : PIG, Hive
SQL and NoSQL
Machine learning (nice to have): Python & algorithms
Provision:
Workflow
Programming : Java, Python and Scala
Development methodologies:
Agile : Safe or Spotify
DataOps
Ancillary capabilities:
Very strong communication skills
Problem Solving
Teamwork
Innovation