Dorian Teffo / Freelance Data Engineer
I will enhance your analytics capabilities, by designing and implementing fully automated data infrastructures and analytics environments customized to meet your specific requirements.
Services
Data Infrastructure and Cloud Solutions
I’ll help you build scalable and reliable data lakes or lakehouses using AWS Lambda, Glue, S3, Athena, and Delta Lake, ensuring your data storage and analysis are top-notch
Data Integration and Warehousing
I’ll design and implement effective ETL processes to centralize your data and create structured data models to support advanced analytics and easy dashboard creation
Automated Reporting and Analytics
I'll make your reporting process effortless by automating reports and reducing manual work.
Projects
Building a Lakehouse on AWS
This project involved processing video game player activity data uploaded every 2 hours to an S3 bucket as JSON files.
The goal was to extract key session metrics, such as average session duration and achievements unlocked, as well as game genre metrics like quests completed by genre.
We chose a lakehouse architecture with Delta Lake on S3 for cost efficiency and used the Medallion framework to maintain data quality. Data was processed through Spark with AWS Glue and orchestrated using Airflow for daily updates. Terraform and GitHub Actions were used for infrastructure deployment and CI/CD.
Modern Data Platform
This project involved processing transaction data uploaded every 2 hours to a PostgreSQL database (RDS) to determine the most appreciated subscription plan by customers.
Due to the unsuitable data schema for analytics, we chose Snowflake as our data warehouse.
We used Airbyte for data ingestion, DBT for data transformation following a Star Schema, and Airflow for scheduling daily updates.
Finally, we connected our BI tool to Snowflake for reporting. Terraform, Docker, and GitHub Actions were utilized for infrastructure deployment and CI/CD.
Automated API Data Extraction
This project involved extracting data daily from an API. We opted to deploy Python code on AWS Lambda triggered by an EventBridge rule for seamless automation.
The infrastructure was built using Terraform, including IAM roles, an ECR repository, and an EventBridge rule. Docker was utilized to containerize the Python code, and GitHub Actions served as the CI/CD tools to update the Lambda function with each Docker image push to ECR.