SRE
• As a Senior Site Reliability Engineer, you will play a critical role in supporting application developers by providing expert guidance on Application and infrastructure best practices from reliability perspective.
• Your role covers the entire life cycle of a product/application. Your primary focus will be Automation, Observability, reliability and Release management with CICD with an emphasis on solving operations issues
• Must have at least 5+ years of SRE experience in large programs with focus on release engineering, observability tasks and reliability
• Must have good understanding of Site Reliability Engineering (SRE) and release management processes
• Define suitable metrics for system with SLO/SLI and setup observability mechanism to track it
• Define error budget as per the SLO
• Define strategy and setup up High Availability and Load Balancer based architecture
• Drive a metrics-driven culture and software delivery process using data to measure overall system quality and reliability.
• Experience with scripting in PowerShell(M) and Bash/Shell/Perl (anyone)
• Strong experience on one or more Observability tools like New Relic, AppDynamics, Prometheus, Dynatrace, DataDog, Splunk,
• Experience in Observability Dashobard creation, custom metrics, Synthetic Monitoring and Real User Monitoring (RUM)
• Strong knowledge of microservices architecture with API’s and REST API’s
• Experience in CICD tooling and best practices
• Experience of Cloud platforms such as AWS, Azure, and Google
• Experience in container orchestration and practices, including Kubernetes, Docker Swarm
• Experience in infrastructure automation tools like Terraform, Cloud Formation, Ansible, and Puppet (Any one)
• Systems Administration and operating system experience on Linux, windows, including an understanding of networking.
Top Companies
Hiring Now!