Looking for an Observability Engineer – AWS Cloud & DevOps in Bangalore and Pune
Bizoforce: Accelerating Digital Innovation
Job Description
Overview
We are seeking an experienced and driven Observability Engineer with deep expertise in AWS, EKS, and observability tooling to join our growing cloud engineering team. The ideal candidate will play a key role in building and managing scalable, resilient, and high-performing monitoring and telemetry infrastructures for cloud-native and containerized applications.
This is a hands-on technical role that requires extensive experience with observability stacks, Kubernetes (EKS), and AWS cloud architecture. You will also be expected to lead troubleshooting initiatives, build monitoring dashboards, and drive automation efforts using modern DevOps practices.
Key Responsibilities
Architect, implement, and manage robust observability platforms using open-source and enterprise-grade tools.
Work closely with development, infrastructure, and security teams to define SLOs/SLAs, alerts, and dashboards across distributed systems.
Maintain and enhance centralized logging, metrics, and tracing platforms including Grafana, Prometheus, Elasticsearch, FluentD, Thanos, Kafka, and Flink.
Perform deep-dive troubleshooting and root cause analysis on production incidents using observability data.
Design monitoring solutions for mission-critical services deployed in AWS EKS (Elastic Kubernetes Service) environments.
Contribute to Terraform- and Helm-based Infrastructure as Code (IaC) provisioning and CI/CD integrations.
Implement best practices for site reliability engineering (SRE), including resiliency testing, incident response, and performance tuning.
Build reusable frameworks and libraries to enable self-service monitoring for engineering teams.
Document observability frameworks, standards, and training materials for stakeholders.
Must-Have Skills
7+ years of experience in AWS Cloud architecture, focusing on design and implementation, not just support or maintenance.
7+ years of hands-on Kubernetes/EKS experience in production environments, including Helm and cluster optimization.
Strong hands-on knowledge of observability tools
Grafana, Prometheus, Thanos, Elasticsearch, FluentD, Kafka, Flink
Deep troubleshooting and operational experience in managing and tuning:
Kafka, RabbitMQ, Redis, Apache Web Server, HAProxy
Solid experience in configuring and managing monitoring/alerting pipelines, log aggregation, and telemetry data flow.
Preferred Tools and Frameworks
Infrastructure as Code (IaC): Terraform, Helm
DevOps Tools: Azure DevOps, GitHub Actions
Scripting/Automation: Bash, Python, or similar
Observability Best Practices: SLOs, SLIs, distributed tracing, service mesh awareness
Areas of Expertise
Cloud & Platform Engineering: AWS, EKS
DevOps & Automation: Terraform, Helm, Azure DevOps
Observability & Telemetry: Grafana, Prometheus, FluentD, Elasticsearch
Years of Experience Required
7 to 10+ years
Share this job
About the Company
Bizoforce: Accelerating Digital Innovation
Chicago