Shreedhar

Data Engineer

Bangalore, India 1+ yrs exp 83 · Excellent

About

Data Engineer with 1.6 years of experience building and operating production-grade ETL pipelines on AWS and Azure. Strong hands-on experience with PySpark, SQL, AWS Glue, S3, Athena, Airflow, Azure Databricks, ADLS Gen2, and Azure Synapse. Built ingestion pipelines from JDBC (MySQL), SFTP files, and REST APIs, handling schema drift and data inconsistencies. Implemented incremental loading, data quality validation, and production monitoring to improve reliability and accuracy. Proven ability to optimize pipeline performance and reporting freshness in production environments.

Skills & Expertise (22)

Python Intermediate

7.0/10

1.6

Years Exp

PySpark Intermediate

6.8/10

1.6

Years Exp

SQL Intermediate

6.8/10

1.6

Years Exp

Jupyter Notebook Intermediate

6.5/10

1.6

Years Exp

Data Modelling Intermediate

6.5/10

1.6

Years Exp

ETL/ELT pipelines Intermediate

6.5/10

1.6

Years Exp

Delta Lake Intermediate

6.5/10

1.6

Years Exp

Parquet Intermediate

6.5/10

1.6

Years Exp

Jira Intermediate

6.5/10

1.6

Years Exp

Git Intermediate

6.5/10

1.6

Years Exp

CloudWatch Intermediate

6.5/10

1.6

Years Exp

QuickSight Intermediate

6.5/10

1.6

Years Exp

Apache Spark Intermediate

6.5/10

1.6

Years Exp

AWS Redshift Intermediate

6.5/10

1.6

Years Exp

MySql Intermediate

6.5/10

1.6

Years Exp

Azure Synapse Intermediate

6.5/10

1.6

Years Exp

ADLS Gen2 Intermediate

6.5/10

1.6

Years Exp

Azure Databricks Intermediate

6.5/10

1.6

Years Exp

Airflow Intermediate

6.5/10

1.6

Years Exp

Athena Intermediate

6.5/10

1.6

Years Exp

S3 Intermediate

6.5/10

1.6

Years Exp

AWS Glue Intermediate

6.5/10

1.6

Years Exp

Work Experience

Data Engineer

Udaan India Pvt. Ltd.

Apr 2025 - Sep 2025

Built ingestion workflows in Azure Databricks to pull data from MySQL CRM (JDBC), SFTP embassy files, and REST API courier tracking services, processing 40,000+ records daily. Developed PySpark transformation scripts to clean and standardize visa application data, handling 15+ document types with varying formats from multiple embassy sources. Designed and implemented star-schema data model in Azure Synapse with 3 fact tables (applications, document verification, courier tracking) and 5-dimension tables for reporting. Implemented incremental loading with watermarking based on the last_modified timestamp column, reducing daily refresh time from 2+ hours to 15 minutes. Built data quality validation frameworks including schema validation, null checks, and duplicate detection, catching data issues before warehouse load. Automated pipeline orchestration using Databricks Jobs with dependency management and email alerting on job failures. Resolved production issues including inconsistent SFTP file formats by implementing schema validation and quarantine processes for bad data. Collaborated with Power BI developer to optimize aggregate tables for faster dashboard refresh and business KPI tracking. Improved data accuracy from 75% to 92% through validation rules and deduplication logic. Built monitoring and alerting solutions with AWS CloudWatch to track ETL job performance, detect failures, and identify data.

Data Engineer Intern

Youlogix Infotech Pvt. Ltd.

Apr 2024 - Mar 2025

Assisted in building data ingestion workflows from MySQL databases (JDBC), SFTP vendor files, and REST APIs, learning to handle authentication, pagination, and incremental extraction patterns under senior engineer guidance. Supported development of PySpark transformation scripts for data cleaning tasks including null handling, duplicate removal, type casting, and date standardization on retail sales and customer datasets. Contributed to implementing incremental loading logic using timestamp-based watermarking to extract only new or updated records from MySQL tables, reducing processing time for daily batch jobs. Worked with AWS Glue Catalog to register cleaned datasets and helped prepare partitioned Parquet tables in S3 for downstream Athena querying by analytics teams. Assisted in configuring Apache Airflow DAGs for scheduling ETL workflows, learning dependency management, retry mechanisms, and basic monitoring practices. Supported data quality validation by implementing schema validation checks, record count verification, and duplicate detection logic before data moved to curated storage layers. Helped troubleshoot production issues including API timeout errors, SFTP file format inconsistencies, and PySpark job failures by analyzing CloudWatch logs and working with senior engineers on fixes. Contributed to implementing Delta Lake concepts including basic MERGE operations for handling upserts and understanding OPTIMIZE for small file compaction (under supervision). Assisted in setting up AWS Lambda triggers for event-driven processing when new files arrived in S3 buckets, learning serverless automation patterns. Worked on monitoring and alerting by helping configure CloudWatch dashboards and SNS email notifications for ETL job failures and data anomalies. Supported data preparation tasks for BI teams by creating aggregate views and summary tables, learning how curated data flows to reporting tools. Participated in code reviews and learned PySpark optimization techniques like broadcast joins for small dimension tables and repartitioning for handling data skew.

Education

Bachelor of Engineering - The Oxford College of Engineering

2019 - 2023 · Afghanistan

Pre-University Course (PUC-Science) - KLE Independent PU College

2017 - 2019 · Afghanistan

Certifications

No certifications added yet

Interested in this developer?

Profile Score Breakdown

📷 Photo 10/10

📄 Resume 10/10

💼 Job Title 10/10

✍️ Bio 10/10

🛠️ Skills 20/20

🎓 Education 10/10

⏱️ Experience 8/15

💰 Rate 0/5

🏆 Certs 0/5

✅ Verified 5/5

Total Score 83/100

Profile Overview

Member sinceFeb 2026

Availability Details

Visa Status

Citizen

Relocation

Open to Relocation

Skills (22)

Click a skill to find developers with the same skill

Python PySpark SQL Jupyter Notebook Data Modelling ETL/ELT pipelines Delta Lake Parquet Jira Git +12 more

Similar Profiles

Saijyothi A

Data Analyst

Omkar Phadatare

Computer Science Graduate

Ankur Rana

Data Analyst

Harshada Tapase

Large Language Model

View All Developers →