About
Data Engineer with 1.6 years of experience building and operating production-grade ETL pipelines on AWS and Azure. Strong hands-on experience with PySpark, SQL, AWS Glue, S3, Athena, Airflow, Azure Databricks, ADLS Gen2, and Azure Synapse. Built ingestion pipelines from JDBC (MySQL), SFTP files, and REST APIs, handling schema drift and data inconsistencies. Implemented incremental loading, data quality validation, and production monitoring to improve reliability and accuracy. Proven ability to optimize pipeline performance and reporting freshness in production environments.
Skills & Expertise (22)
Work Experience
Data Engineer
Udaan India Pvt. Ltd.
Apr 2025 - Sep 2025
Built ingestion workflows in Azure Databricks to pull data from MySQL CRM (JDBC), SFTP embassy files, and REST API courier tracking services, processing 40,000+ records daily. Developed PySpark transformation scripts to clean and standardize visa application data, handling 15+ document types with varying formats from multiple embassy sources. Designed and implemented star-schema data model in Azure Synapse with 3 fact tables (applications, document verification, courier tracking) and 5-dimension tables for reporting. Implemented incremental loading with watermarking based on the last_modified timestamp column, reducing daily refresh time from 2+ hours to 15 minutes. Built data quality validation frameworks including schema validation, null checks, and duplicate detection, catching data issues before warehouse load. Automated pipeline orchestration using Databricks Jobs with dependency management and email alerting on job failures. Resolved production issues including inconsistent SFTP file formats by implementing schema validation and quarantine processes for bad data. Collaborated with Power BI developer to optimize aggregate tables for faster dashboard refresh and business KPI tracking. Improved data accuracy from 75% to 92% through validation rules and deduplication logic. Built monitoring and alerting solutions with AWS CloudWatch to track ETL job performance, detect failures, and identify data.
Data Engineer Intern
Youlogix Infotech Pvt. Ltd.
Apr 2024 - Mar 2025
Assisted in building data ingestion workflows from MySQL databases (JDBC), SFTP vendor files, and REST APIs, learning to handle authentication, pagination, and incremental extraction patterns under senior engineer guidance. Supported development of PySpark transformation scripts for data cleaning tasks including null handling, duplicate removal, type casting, and date standardization on retail sales and customer datasets. Contributed to implementing incremental loading logic using timestamp-based watermarking to extract only new or updated records from MySQL tables, reducing processing time for daily batch jobs. Worked with AWS Glue Catalog to register cleaned datasets and helped prepare partitioned Parquet tables in S3 for downstream Athena querying by analytics teams. Assisted in configuring Apache Airflow DAGs for scheduling ETL workflows, learning dependency management, retry mechanisms, and basic monitoring practices. Supported data quality validation by implementing schema validation checks, record count verification, and duplicate detection logic before data moved to curated storage layers. Helped troubleshoot production issues including API timeout errors, SFTP file format inconsistencies, and PySpark job failures by analyzing CloudWatch logs and working with senior engineers on fixes. Contributed to implementing Delta Lake concepts including basic MERGE operations for handling upserts and understanding OPTIMIZE for small file compaction (under supervision). Assisted in setting up AWS Lambda triggers for event-driven processing when new files arrived in S3 buckets, learning serverless automation patterns. Worked on monitoring and alerting by helping configure CloudWatch dashboards and SNS email notifications for ETL job failures and data anomalies. Supported data preparation tasks for BI teams by creating aggregate views and summary tables, learning how curated data flows to reporting tools. Participated in code reviews and learned PySpark optimization techniques like broadcast joins for small dimension tables and repartitioning for handling data skew.
Education
Bachelor of Engineering - The Oxford College of Engineering
2019 - 2023 · Afghanistan
Pre-University Course (PUC-Science) - KLE Independent PU College
2017 - 2019 · Afghanistan
Interested in this developer?
Profile Score Breakdown
Profile Overview
Availability Details
Visa Status
Citizen
Relocation
Open to Relocation