Back to Developers
Vinay kumar

Vinay kumar

Senior DevOps and Site Reliability Engineer

Phoenix, AZ
80
Profile Score

About

Senior DevOps and Site Reliability Engineer with 5 years of experience designing, deploying, and operating large-scale infrastructure across AWS, Azure, and GCP environments. Proficient in end-to-end CI/CD pipeline engineering using Jenkins, GitHub Actions, and ArgoCD, infrastructure automation with Terraform and Ansible, and container orchestration with Kubernetes and Docker across cloud and on-premises environments. Strong background in production reliability including SLO/SLI definition, incident response, blameless post-mortems, and observability platform development using Prometheus, Grafana, ELK Stack, and Datadog. Experienced working in Agile Scrum teams with a consistent focus on eliminating manual toil through scripting automation and scalable infrastructure solutions.

Skills & Expertise (65)

Kubernetes Expert
9.4/10
5
Years Exp
Terraform Advanced
8.8/10
5
Years Exp
Docker Advanced
8.7/10
5
Years Exp
MySql HTTP DNS IP TCP Redis Postgresql SSH MongoDb Kafka Splunk Datadog ELK Stack Grafana Prometheus Groovy HTTPS SQL VPC IAM RBAC Trivy OpenSSL Secrets Manager ServiceNow Jira Confluence LINUX RHEL CentOS Ubuntu Jenkins AWS (S3 AWS (RDS AWS (IAM AWS (CloudWatch AWS (Lambda Azure DevOps Azure AD Azure Monitor Azure Pipelines Helm Kustomize Containerd Ansible Packer Puppet YAML GitHub Actions ArgoCD CircleCI JFrog Artifactory Maven SonarQube Git Python Bash Shell Scripting Go PowerShell AWS (EC2 JSON

Work Experience

Senior Cloud Infrastructure Engineer

Tata Consultancy Services - Client: Waste Management

Nov 2022 - Jul 2024

Designed and managed end-to-end CI/CD pipelines using Jenkins and GitHub Actions, integrating webhook-triggered builds, automated testing, artifact publishing to JFrog Artifactory, and GitOps-based continuous deployment via ArgoCD for 15+ microservices across Kubernetes (EKS) environments. Increased release frequency by 30% through full pipeline automation. Developed and maintained Ansible playbooks for end-to-end workflow automation, covering infrastructure provisioning, application deployment, configuration management, and security hardening across a 100+ node Linux fleet. Integrated playbooks with GitHub Actions webhooks to trigger automated deployments on code merge, reducing manual operational effort by 40%. Implemented infrastructure cost optimization strategies across AWS environments by deploying Karpenter for intelligent node provisioning, configuring HPA for demand-based pod scaling, right-sizing EC2 instance families, and replacing long-running processes with event-driven Lambda functions using EventBridge triggers. Reduced cloud compute spend through continuous capacity planning and idle resource elimination. Administered distributed data infrastructure including Kafka event streaming clusters (topic management, consumer group monitoring, lag alerting) and MongoDB replica sets (cluster operations, indexing optimization, replication monitoring). Integrated observability using Prometheus exporters and Grafana dashboards to track throughput, latency, and system health across distributed services. Built Python and Bash automation scripts to eliminate repetitive manual operational tasks, including infrastructure provisioning, patching schedules, log parsing, health checks, user provisioning, and AWS resource management via boto3. All automation followed code review standards and was version-controlled, replacing manual workflows that previously required multiple hours of engineer time weekly. Deployed and operated cloud-native services across AWS and Azure including Lambda functions with SQS/SNS event triggers, CloudFormation stack deployments, Azure DevOps Pipelines, and ELK Stack for centralized log aggregation and application monitoring. Built CloudWatch and Datadog alerting to provide real-time visibility into system health across multi-cloud deployments. Managed production Kubernetes cluster deployments using Helm and Kustomize for multi-environment configuration management, Istio for service mesh traffic control and mTLS enforcement, and HPA with Karpenter for auto-scaling under variable load. Maintained cluster control plane health, RBAC policies, persistent storage, and network policies across EKS and on-premises environments. Built modular infrastructure as code using Terraform with remote state management and reusable module patterns for provisioning compute, networking, storage, and IAM resources across AWS accounts. Used AWS CloudFormation for native stack deployments and Packer for standardized AMI and machine image builds, ensuring all environments were version-controlled, auditable, and reproducible. Participated in Agile Scrum ceremonies including sprint planning, daily standups, retrospectives, and weekly progress reviews with service owners and stakeholders. Managed infrastructure tasks and project delivery through JIRA, maintained change management workflows using ServiceNow, and led blameless post-mortems after production incidents to track corrective actions to completion. Defined SLOs and SLIs with engineering teams, built Prometheus and Grafana observability platforms for system-wide monitoring, and embedded DevSecOps practices into every pipeline by integrating Trivy for container scanning and SonarQube for static code analysis. Enforced IAM least-privilege access controls, RBAC, and CIS Benchmark hardening across Linux infrastructure and cloud environments.

Cloud and DevOps Engineer

Cognizant - Client: Kaiser Permanente

Jan 2019 - Jun 2022

Maintained 24/7 availability for critical healthcare platform infrastructure on RHEL and CentOS Linux, managing OS configuration, kernel tuning, security patching, and Systemd service management. Handled incident response and change management workflows through ServiceNow, leading root cause analysis and blameless post-mortems to reduce recurring outages by 30% over two years. Built Azure DevOps CI/CD pipelines and GitHub Actions workflows for Kubernetes (AKS) and App Service deployments, implementing automated quality gates, security scanning, and GitOps-driven configuration management via ArgoCD. Webhook integrations triggered builds automatically on pull request merge, eliminating manual deployment steps. Wrote Python and Bash automation frameworks to replace manual operational workflows: automated patching cycles, user provisioning, host health checks, log rotation, and AWS resource cleanup via boto3. Every repeated manual task was assessed for automation viability and scripted, reviewed, and deployed within one sprint. Deployed ELK Stack and Datadog for centralized log aggregation, APM dashboarding, and performance monitoring across distributed Linux infrastructure. Configured Prometheus exporters and Grafana dashboards for service-level metrics, SLO tracking, and alerting that enabled proactive identification of degradation before it reached end users. Administered production MySQL and PostgreSQL databases in a HIPAA-regulated environment, performing performance tuning, schema migrations, replication monitoring, and backup management. Supported distributed system reliability across MongoDB clusters and Kafka consumer pipelines, monitoring lag, throughput, and partition health. Managed infrastructure provisioning using Terraform and Ansible across AWS and Azure, maintaining strict environment parity between staging and production through version-controlled IaC. Provisioned and configured cloud resources including VPCs, subnets, IAM roles, RDS instances, and Kubernetes clusters from reusable Terraform module patterns.

Education

Master of Science in Information Technology - Northern Arizona University

2024 - 2025 · Afghanistan

Bachelor of Technology in Mechanical Engineering - Pragati Engineering College

2014 - 2018 · Afghanistan

Certifications

No certifications added yet

Interested in this developer?

Profile Score Breakdown

📷 Photo 10/10
📄 Resume 10/10
💼 Job Title 10/10
✍️ Bio 10/10
🛠️ Skills 20/20
🎓 Education 10/10
⏱️ Experience 5/15
💰 Rate 0/5
🏆 Certs 0/5
Verified 5/5
Total Score 80/100

Profile Overview

Member sinceApr 2026

Availability Details

Visa Status

OPT

Relocation

Open to Relocation

Skills (65)

Click a skill to find developers with the same skill