Vacature

Data Engineer

Vlaams-Brabant

Solliciteer

Individuals within the Data Engineer role ensure that data pipelines are scalable, repeatable, and can serve multiple users. They help facilitate getting data from a variety of different sources, getting it in the right formats, assuring that it adheres to meta data quality standards, and assuring that downstream users can get that data quickly. This role usually functions as a core member of an agile team.

These professionals are responsible for the frameworks and services that makes sure the data on the datalake can easily be:

  • Queried by means of the data processing framework and our metadata repository and metadata reader service

  • Filtered on GDPR by our GDPR framework

  • Transformed as the parquet format through our consumer feed framework

  • Mapped in a data flow through the lineage service

The Data Engineer is a technical job that requires substantial expertise in a broad range of software development and programming fields. These professionals have knowledge of data analysis, end user requirements analysis, and business requirements analysis to develop a clear understanding of the business needs and to incorporate these needs into technical solutions. They have a solid understanding of physical database design principles, and the system development life cycle. These individuals must work well in a team environment.

Responsibilities

  • Designing, developing, constructing, testing and maintaining the complete data management & processing systems – Data Pipelines

  • Aggregate & Transform raw data coming from a variety of data sources to fulfill the functional & non-functional business needs – Data Transformation

  • Discovering various opportunities for data acquisitions and exploring new ways of using existing data – Data Ingestion

  • Creating of data models to reduce system complexity and hence increase efficiency and reduce cost – Data Architecture & Models

  • Performance optimization & monitoring: automating processes, optimizing data delivery & re-designing the complete architecture to improve performance

  • Proposing ways to improve data quality, reliability & efficiency of the whole system – Data Quality

  • Creating a solution by integrating a variety of programming languages and tools together – Data Value

  • As a senior, it is expected to be very well communicative and to be extravert for what concerns solutions (technical-architectural), actively helping and supporting the other team members.

Ideal Profile

Store:

  • Data Modelling

  • Data Architecture

  • Airflow

  • AWS

  • Big Data Framework / Hadoop: HDFS, Squid, Spark, Conda, Yarn and MapReduce

  • NoSQL Databases: Cassandra, Hbase, MongoDB

Access & Transport – Connectivity:

  • ETL ( Extract, Transform, Load ) : Informatica

  • Big Data Framework / Hadoop : Flume & Sqoop, Yarn, Zookeeper

Enrich:

  • Real-time processing framework – Apache Spark

  • Big Data Framework / Hadoop : PIG, Hive

  • SQL and NoSQL

  • Machine learning (nice to have): Python & algorithms

Provision:

  • Workflow

  • Programming : Java, Python and Scala

Development methodologies:

  • Agile : Safe or Spotify

  • DataOps

Ancillary capabilities:

  • Very strong communication skills

  • Problem Solving

  • Teamwork

  • Innovation