Back to Developers
Shreedhar

Shreedhar

Data Engineer

Bangalore, India
80
Profile Score

About

Data Engineer with 1.6 years of experience building and operating production-grade ETL pipelines on AWS and Azure. Strong hands-on experience with PySpark, SQL, AWS Glue, S3, Athena, Airflow, Azure Databricks, ADLS Gen2, and Azure Synapse. Built ingestion pipelines from JDBC (MySQL), SFTP files, and REST APIs, handling schema drift and data inconsistencies. Implemented incremental loading, data quality validation, and production monitoring to improve reliability and accuracy. Proven ability to optimize pipeline performance and reporting freshness in production environments.

Skills & Expertise (22)

Python Intermediate
7.0/10
1.6
Years Exp
PySpark Intermediate
6.8/10
1.6
Years Exp
SQL Intermediate
6.8/10
1.6
Years Exp
Jupyter Notebook Intermediate
6.5/10
1.6
Years Exp
Data Modelling Intermediate
6.5/10
1.6
Years Exp
ETL/ELT pipelines Intermediate
6.5/10
1.6
Years Exp
Delta Lake Intermediate
6.5/10
1.6
Years Exp
Parquet Intermediate
6.5/10
1.6
Years Exp
Jira Intermediate
6.5/10
1.6
Years Exp
Git Intermediate
6.5/10
1.6
Years Exp
CloudWatch Intermediate
6.5/10
1.6
Years Exp
QuickSight Intermediate
6.5/10
1.6
Years Exp
Apache Spark Intermediate
6.5/10
1.6
Years Exp
AWS Redshift Intermediate
6.5/10
1.6
Years Exp
MySql Intermediate
6.5/10
1.6
Years Exp
Azure Synapse Intermediate
6.5/10
1.6
Years Exp
ADLS Gen2 Intermediate
6.5/10
1.6
Years Exp
Azure Databricks Intermediate
6.5/10
1.6
Years Exp
Airflow Intermediate
6.5/10
1.6
Years Exp
Athena Intermediate
6.5/10
1.6
Years Exp
S3 Intermediate
6.5/10
1.6
Years Exp
AWS Glue Intermediate
6.5/10
1.6
Years Exp

Work Experience

Data Engineer

Udaan India Pvt. Ltd.

Apr 2025 - Sep 2025

Built ingestion workflows in Azure Databricks to pull data from MySQL CRM (JDBC), SFTP embassy files, and REST API courier tracking services, processing 40,000+ records daily. Developed PySpark transformation scripts to clean and standardize visa application data, handling 15+ document types with varying formats from multiple embassy sources. Designed and implemented star-schema data model in Azure Synapse with 3 fact tables (applications, document verification, courier tracking) and 5-dimension tables for reporting. Implemented incremental loading with watermarking based on the last_modified timestamp column, reducing daily refresh time from 2+ hours to 15 minutes. Built data quality validation frameworks including schema validation, null checks, and duplicate detection, catching data issues before warehouse load. Automated pipeline orchestration using Databricks Jobs with dependency management and email alerting on job failures. Resolved production issues including inconsistent SFTP file formats by implementing schema validation and quarantine processes for bad data. Collaborated with Power BI developer to optimize aggregate tables for faster dashboard refresh and business KPI tracking. Improved data accuracy from 75% to 92% through validation rules and deduplication logic. Built monitoring and alerting solutions with AWS CloudWatch to track ETL job performance, detect failures, and identify data.

Data Engineer Intern

Youlogix Infotech Pvt. Ltd.

Apr 2024 - Mar 2025

Assisted in building data ingestion workflows from MySQL databases (JDBC), SFTP vendor files, and REST APIs, learning to handle authentication, pagination, and incremental extraction patterns under senior engineer guidance. Supported development of PySpark transformation scripts for data cleaning tasks including null handling, duplicate removal, type casting, and date standardization on retail sales and customer datasets. Contributed to implementing incremental loading logic using timestamp-based watermarking to extract only new or updated records from MySQL tables, reducing processing time for daily batch jobs. Worked with AWS Glue Catalog to register cleaned datasets and helped prepare partitioned Parquet tables in S3 for downstream Athena querying by analytics teams. Assisted in configuring Apache Airflow DAGs for scheduling ETL workflows, learning dependency management, retry mechanisms, and basic monitoring practices. Supported data quality validation by implementing schema validation checks, record count verification, and duplicate detection logic before data moved to curated storage layers. Helped troubleshoot production issues including API timeout errors, SFTP file format inconsistencies, and PySpark job failures by analyzing CloudWatch logs and working with senior engineers on fixes. Contributed to implementing Delta Lake concepts including basic MERGE operations for handling upserts and understanding OPTIMIZE for small file compaction (under supervision). Assisted in setting up AWS Lambda triggers for event-driven processing when new files arrived in S3 buckets, learning serverless automation patterns. Worked on monitoring and alerting by helping configure CloudWatch dashboards and SNS email notifications for ETL job failures and data anomalies. Supported data preparation tasks for BI teams by creating aggregate views and summary tables, learning how curated data flows to reporting tools. Participated in code reviews and learned PySpark optimization techniques like broadcast joins for small dimension tables and repartitioning for handling data skew.

Education

Bachelor of Engineering - The Oxford College of Engineering

2019 - 2023 · Afghanistan

Pre-University Course (PUC-Science) - KLE Independent PU College

2017 - 2019 · Afghanistan

Interested in this developer?

Profile Score Breakdown

📷 Photo 10/10
📄 Resume 10/10
💼 Job Title 10/10
✍️ Bio 10/10
🛠️ Skills 20/20
🎓 Education 10/10
⏱️ Experience 5/15
💰 Rate 0/5
🏆 Certs 0/5
Verified 5/5
Total Score 80/100

Profile Overview

Member sinceFeb 2026

Availability Details

Visa Status

Citizen

Relocation

Open to Relocation

Skills (22)

Python PySpark SQL Jupyter Notebook Data Modelling ETL/ELT pipelines Delta Lake Parquet Jira Git +12 more