About
Senior DevOps and Site Reliability Engineer with 5 years of experience designing, deploying, and operating large-scale infrastructure across AWS, Azure, and GCP environments. Proficient in end-to-end CI/CD pipeline engineering using Jenkins, GitHub Actions, and ArgoCD, infrastructure automation with Terraform and Ansible, and container orchestration with Kubernetes and Docker across cloud and on-premises environments. Strong background in production reliability including SLO/SLI definition, incident response, blameless post-mortems, and observability platform development using Prometheus, Grafana, ELK Stack, and Datadog. Experienced working in Agile Scrum teams with a consistent focus on eliminating manual toil through scripting automation and scalable infrastructure solutions.
Skills & Expertise (65)
Work Experience
Senior Cloud Infrastructure Engineer
Tata Consultancy Services - Client: Waste Management
Nov 2022 - Jul 2024
Designed and managed end-to-end CI/CD pipelines using Jenkins and GitHub Actions, integrating webhook-triggered builds, automated testing, artifact publishing to JFrog Artifactory, and GitOps-based continuous deployment via ArgoCD for 15+ microservices across Kubernetes (EKS) environments. Increased release frequency by 30% through full pipeline automation. Developed and maintained Ansible playbooks for end-to-end workflow automation, covering infrastructure provisioning, application deployment, configuration management, and security hardening across a 100+ node Linux fleet. Integrated playbooks with GitHub Actions webhooks to trigger automated deployments on code merge, reducing manual operational effort by 40%. Implemented infrastructure cost optimization strategies across AWS environments by deploying Karpenter for intelligent node provisioning, configuring HPA for demand-based pod scaling, right-sizing EC2 instance families, and replacing long-running processes with event-driven Lambda functions using EventBridge triggers. Reduced cloud compute spend through continuous capacity planning and idle resource elimination. Administered distributed data infrastructure including Kafka event streaming clusters (topic management, consumer group monitoring, lag alerting) and MongoDB replica sets (cluster operations, indexing optimization, replication monitoring). Integrated observability using Prometheus exporters and Grafana dashboards to track throughput, latency, and system health across distributed services. Built Python and Bash automation scripts to eliminate repetitive manual operational tasks, including infrastructure provisioning, patching schedules, log parsing, health checks, user provisioning, and AWS resource management via boto3. All automation followed code review standards and was version-controlled, replacing manual workflows that previously required multiple hours of engineer time weekly. Deployed and operated cloud-native services across AWS and Azure including Lambda functions with SQS/SNS event triggers, CloudFormation stack deployments, Azure DevOps Pipelines, and ELK Stack for centralized log aggregation and application monitoring. Built CloudWatch and Datadog alerting to provide real-time visibility into system health across multi-cloud deployments. Managed production Kubernetes cluster deployments using Helm and Kustomize for multi-environment configuration management, Istio for service mesh traffic control and mTLS enforcement, and HPA with Karpenter for auto-scaling under variable load. Maintained cluster control plane health, RBAC policies, persistent storage, and network policies across EKS and on-premises environments. Built modular infrastructure as code using Terraform with remote state management and reusable module patterns for provisioning compute, networking, storage, and IAM resources across AWS accounts. Used AWS CloudFormation for native stack deployments and Packer for standardized AMI and machine image builds, ensuring all environments were version-controlled, auditable, and reproducible. Participated in Agile Scrum ceremonies including sprint planning, daily standups, retrospectives, and weekly progress reviews with service owners and stakeholders. Managed infrastructure tasks and project delivery through JIRA, maintained change management workflows using ServiceNow, and led blameless post-mortems after production incidents to track corrective actions to completion. Defined SLOs and SLIs with engineering teams, built Prometheus and Grafana observability platforms for system-wide monitoring, and embedded DevSecOps practices into every pipeline by integrating Trivy for container scanning and SonarQube for static code analysis. Enforced IAM least-privilege access controls, RBAC, and CIS Benchmark hardening across Linux infrastructure and cloud environments.
Cloud and DevOps Engineer
Cognizant - Client: Kaiser Permanente
Jan 2019 - Jun 2022
Maintained 24/7 availability for critical healthcare platform infrastructure on RHEL and CentOS Linux, managing OS configuration, kernel tuning, security patching, and Systemd service management. Handled incident response and change management workflows through ServiceNow, leading root cause analysis and blameless post-mortems to reduce recurring outages by 30% over two years. Built Azure DevOps CI/CD pipelines and GitHub Actions workflows for Kubernetes (AKS) and App Service deployments, implementing automated quality gates, security scanning, and GitOps-driven configuration management via ArgoCD. Webhook integrations triggered builds automatically on pull request merge, eliminating manual deployment steps. Wrote Python and Bash automation frameworks to replace manual operational workflows: automated patching cycles, user provisioning, host health checks, log rotation, and AWS resource cleanup via boto3. Every repeated manual task was assessed for automation viability and scripted, reviewed, and deployed within one sprint. Deployed ELK Stack and Datadog for centralized log aggregation, APM dashboarding, and performance monitoring across distributed Linux infrastructure. Configured Prometheus exporters and Grafana dashboards for service-level metrics, SLO tracking, and alerting that enabled proactive identification of degradation before it reached end users. Administered production MySQL and PostgreSQL databases in a HIPAA-regulated environment, performing performance tuning, schema migrations, replication monitoring, and backup management. Supported distributed system reliability across MongoDB clusters and Kafka consumer pipelines, monitoring lag, throughput, and partition health. Managed infrastructure provisioning using Terraform and Ansible across AWS and Azure, maintaining strict environment parity between staging and production through version-controlled IaC. Provisioned and configured cloud resources including VPCs, subnets, IAM roles, RDS instances, and Kubernetes clusters from reusable Terraform module patterns.
Education
Master of Science in Information Technology - Northern Arizona University
2024 - 2025 · Afghanistan
Bachelor of Technology in Mechanical Engineering - Pragati Engineering College
2014 - 2018 · Afghanistan
Certifications
No certifications added yet
Interested in this developer?
Profile Score Breakdown
Profile Overview
Availability Details
Visa Status
OPT
Relocation
Open to Relocation
Skills (65)
Click a skill to find developers with the same skill