Rachid EL MAAZOUZ

Software Engineer - Senior Data Engineer

relmaazouz@proton.me

+33 6 24 01 00 36

Summary

Data Engineer & Software Engineer with more than a decade of experience designing, building, and modernizing scalable data platforms. Strong expertise in Apache Spark, Azure Data services, and the Databricks Lakehouse ecosystem (Delta Lake, Unity Catalog, Workflows). Experienced in cloud-native architecture, large-scale data processing, and ETL/ELT pipelines. Engineering-driven mindset focused on automation, CI/CD, governance, and delivering reliable, high-performance data products that support business and analytics at scale.

Professional Experience

Senior Data Engineer

Candriam

Aug. 2024 - Present

Designed and implemented scalable data ingestion frameworks in PySpark, driven by metadata, for large-scale pipelines
Developed a robust data quality framework ensuring completeness, consistency, and reliability
Built and optimized ETL/ELT pipelines (ingestion, transformation, loading) following medallion/lakehouse architecture
Engineered and deployed data APIs with C# enabling low-latency access for quants and analysts
Designed and delivered a data testing framework for automated validation and performance benchmarking
Worked in Azure cloud environment (Data Lake, Synapse, Delta, Pipelines, DevOps)
Collaborated with risk teams to deliver fit-for-purpose data products supporting analytics and reporting

PySparkPythonAzureSynapseAzure DevOps.NetC#GitDeltaParquet

Senior Data Engineer

Informatique CDC

Jul. 2023 - Sept. 2024

Implemented business use cases: data structures, CI/CD pipelines, PySpark jobs
Established development best practices: GitFlow, CI/CD workflows, automated testing
Modeled and implemented Data Vault architecture across ingestion and datamart layers
Developed large-scale ingestion, transformation, and exposure jobs in PySpark and Python
Integrated external market and risk data sources (Bloomberg, ratings, etc.) via Kafka
Worked with Cloudera ecosystem (Hive, Spark, Kafka) with Jenkins, Control-M, Bitbucket
Delivered reliable data products supporting business and risk management
Migratre the data lake on Databricks Azure (with unity catalog): modeled and delivred the new Data Vault architecture across ingestion and datamart layers, later enhanced with Unity Catalog to enforce centralized governance, lineage, and fine-grained access control.
Designed new data pipelines: Refactored on-prem scripted ETL pipelines into modular Databricks notebooks, orchestrated through Databricks Jobs and Workflows with Delta Lake optimizations.

PySparkPythonKafkaClouderaHiveSparkJenkinsControl-MBitbucketDatabricksAzure

Data Engineer

Crédit Agricole

Nov 2021 - Jun 2023

Maintained operations of large-scale Data Lake on MapR (Hive, Hadoop, Sqoop, Tez, Oozie, PySpark)
Developed ingestion, transformation, and exposure jobs in PySpark for high-volume datasets
Led migration and redesign of Data Lake with new zoning and pipelines
Translated and optimized legacy Hive SQL scripts into Oracle SQL
Automated ODI object generation via Python scripting
Implemented automated data validation frameworks
Containerized and deployed applications with Kubernetes and ArgoCD

PySparkHiveHadoopSqoopTezOozieOracle ODIKubernetesArgoCDControl-MGit

Consultant Data

Informatique CDC

Nov 2020 - Oct 2021

Gathered requirements for eFront FIA migration and defined new data models and workflows
Authored detailed technical specifications for transaction mapping and codification framework
Extracted and transformed data from FIC using Front Script (funds, investors, companies, instruments, transactions)
Implemented a PostgreSQL-based data hub for transaction and fund exchanges with AWS S3 integration
Developed custom Spring Boot web application to support valuation workflows
Worked in hybrid environment with SQL Server, PostgreSQL, Kafka, AWS S3, Spring Boot, eFront FIA

PostgreSQLSQL ServerKafkaAWS S3Spring BooteFront FIA

Data Engineer

Consolis Group

Jul 2018 - Sep 2020

Designed and developed a financial performance data lake on Google BigQuery
Built ingestion pipelines from subsidiaries using Pub/Sub and enriched data in transformation layers
Developed dashboards and reports in Data Studio for business stakeholders
Worked within the Google Cloud Platform ecosystem (BigQuery, DataProc, Composer, Cloud Storage)

BigQueryPub/SubDataProcComposerData StudioCloud Storage

Data Engineer / Python Developer

Inwi

Aug 2016 - May 2018

Installed and configured Hadoop ecosystem (Ambari, RHEL, Kafka, HBase, Hive)
Built real-time ingestion pipelines with Apache Flink to process CRM, CDRs, HLR, VLR, PPS data
Designed datamarts for network, equipment, voice/SMS/MMS, mobile, and customer datasets
Prepared enriched customer datasets (social networks, geolocation, call/SMS history) for churn prediction
Developed interactive visualizations (time-series, bubble charts, word clouds) for business metrics

HadoopKafkaHBaseHiveApache FlinkPySparkAmbari

ERP Developer

OCP

May 2014 - Aug 2016

Developed analytical and operational reports with SQL, PL/SQL, and XML Publisher
Customized and built reporting interfaces with Oracle OAF and Oracle Forms
Developed APIs and web services to integrate Oracle ERP with legacy systems

Oracle E-Business SuiteSQLPL/SQLOracle OAFOracle FormsXML Publisher

Software Engineer

S2M

Aug 2013 - Apr 2014

Designed data schemas and exchange interfaces to integrate core banking data
Defined application and data architecture for e-banking system
Developed and deployed APIs and services (authentication, transfers, statements, payroll) for secure real-time banking

JavaSpring WebHibernateOracle DBPostgreSQLApache CXFJenkinsHTML5

Education

Masters in Financial Markets and Capital Management

CNAM

September 2025

Sep 2023 - May 2025

Software Engineer

Mohammadia School of Engineers

Jully 2013

Sep 2023 - May 2025

Certifications

GCP Professional Data Engineer

Google

August 2023

MongoDB Certified DBA

MongoDB

September 2022

Passionate about building scalable data platforms and solving complex problems through code.