Senior Site Reliability Engineer (Bangalore Division)
Senior Site Reliability Engineer (Bangalore Division)
-
Bangalore Division, India
-
Posted: less than a week ago
-
Save
Description
We are seeking a Senior Observability / Monitoring Engineer to drive end-to-end observability and monitoring for enterprise platforms. This role will focus on enabling proactive issue detection, faster incident resolution, and improved system reliability through effective use of observability tools and practices. The ideal candidate will bring strong experience in logs, metrics, traces, alerting strategies, and monitoring tools, along with hands-on exposure to production environments and SRE practices. Complete Night Shift Role Key Responsibilities Observability Engineering
• Design and implement end-to-end observability solutions across applications and infrastructure
• Establish unified visibility across logs, metrics, and distributed tracing
• Define and standardize monitoring frameworks, dashboards, and alerting strategies
• Enable proactive detection of issues through intelligent alerting and anomaly detection Monitoring & Tooling
• Implement and manage tools such as Splunk, Datadog, Prometheus, Grafana, Recent Relic, or similar
• Build actionable dashboards for SRE, operations, and business stakeholders
• Optimize alert configurations to reduce noise and improve signal quality
• Continuously enhance monitoring coverage across systems and services Incident Support & Reliability
• Support late night / US overlap shift for production monitoring and incident response
• Analyze logs, metrics, and traces to support incident triage and root cause analysis (RCA)
• Collaborate with SRE and engineering teams to improve system reliability and performance
• Participate in post-incident reviews and continuous improvement initiatives Automation & Integration
• Automate monitoring setup and configuration using Infrastructure as Code (IaC)
• Integrate observability tools with CI/CD pipelines and DevOps workflows
• Develop scripts/tools to improve data collection, alerting, and reporting Platform & Integration Support
• Monitor enterprise applications, APIs, and integration layers (e.g., middleware, cloud services)
• Ensure end-to-end visibility across distributed systems and microservices architectures
• Work closely with platform teams (cloud, Salesforce, etc.) to enhance observability Governance & Compliance
• Ensure monitoring practices align with security and compliance requirements (e.g., SOX)
• Maintain runbooks, documentation, and monitoring standards
• Support audit and governance requirements as needed Required Skills & Qualifications Technical Skills
• Strong experience in observability, monitoring, or SRE roles
• Hands-on experience with tools like Splunk, Datadog, Prometheus, Grafana, New Relic
• Strong understanding of logs, metrics, traces, and distributed systems
• Experience with APM tools and performance monitoring
• Scripting skills (Python, Bash, PowerShell, or similar)
• Familiarity with CI/CD tools (Jenkins, GitHub Actions, Azure DevOps)
• Knowledge of Infrastructure as Code (Terraform or similar) Operational Excellence
• Experience supporting production environments in 24x7 models
• Solid incident management and RCA capabilities
• Ability to analyze performance issues and recommend improvements Soft Skills
• Ability to work effectively in a late night / US overlap shift
• Strong communication and collaboration skills
• Proactive mindset with a focus on continuous improvement Apply on Kit Job: kitjob.in/job/4m7hrs
• Design and implement end-to-end observability solutions across applications and infrastructure
• Establish unified visibility across logs, metrics, and distributed tracing
• Define and standardize monitoring frameworks, dashboards, and alerting strategies
• Enable proactive detection of issues through intelligent alerting and anomaly detection Monitoring & Tooling
• Implement and manage tools such as Splunk, Datadog, Prometheus, Grafana, Recent Relic, or similar
• Build actionable dashboards for SRE, operations, and business stakeholders
• Optimize alert configurations to reduce noise and improve signal quality
• Continuously enhance monitoring coverage across systems and services Incident Support & Reliability
• Support late night / US overlap shift for production monitoring and incident response
• Analyze logs, metrics, and traces to support incident triage and root cause analysis (RCA)
• Collaborate with SRE and engineering teams to improve system reliability and performance
• Participate in post-incident reviews and continuous improvement initiatives Automation & Integration
• Automate monitoring setup and configuration using Infrastructure as Code (IaC)
• Integrate observability tools with CI/CD pipelines and DevOps workflows
• Develop scripts/tools to improve data collection, alerting, and reporting Platform & Integration Support
• Monitor enterprise applications, APIs, and integration layers (e.g., middleware, cloud services)
• Ensure end-to-end visibility across distributed systems and microservices architectures
• Work closely with platform teams (cloud, Salesforce, etc.) to enhance observability Governance & Compliance
• Ensure monitoring practices align with security and compliance requirements (e.g., SOX)
• Maintain runbooks, documentation, and monitoring standards
• Support audit and governance requirements as needed Required Skills & Qualifications Technical Skills
• Strong experience in observability, monitoring, or SRE roles
• Hands-on experience with tools like Splunk, Datadog, Prometheus, Grafana, New Relic
• Strong understanding of logs, metrics, traces, and distributed systems
• Experience with APM tools and performance monitoring
• Scripting skills (Python, Bash, PowerShell, or similar)
• Familiarity with CI/CD tools (Jenkins, GitHub Actions, Azure DevOps)
• Knowledge of Infrastructure as Code (Terraform or similar) Operational Excellence
• Experience supporting production environments in 24x7 models
• Solid incident management and RCA capabilities
• Ability to analyze performance issues and recommend improvements Soft Skills
• Ability to work effectively in a late night / US overlap shift
• Strong communication and collaboration skills
• Proactive mindset with a focus on continuous improvement Apply on Kit Job: kitjob.in/job/4m7hrs
Highlights
-
Company nameBrillio
-
Job positionSenior Site Reliability Engineer (Bangalore Division)
Safety Tips
Be careful with commission-based ’work-from-home’ positions that offer an unrealistically high income.
More info about this ad
Senior Site Reliability Engineer (Bangalore Division) has been posted in the Whitefield Engineering category on Locanto.
Right now, this is the only ad posted in this category in Whitefield.
Interested in more? Widen your search to view ads in nearby areas of Whitefield. This includes Engineering in Murugeshpalya, Brookefield and Marathahalli. There are more ads within a 15 km radius for this category. If you want to view those ads, click here.