Senior Site Reliability Engineer
Quantum World Technologies Inc. • vancouver, Canada
Role Description
6-12 months
Role Descriptions
- Design and implement observability-as-code solutions using Terraform to deploy monitoring pipelines, dashboards, and alerting strategies across distributed systems.
- Drive observability improvements leveraging industry-leading tools (Dynatrace, ELK, Splunk, PagerDuty) to achieve real-time performance insights and comprehensive system visibility.
- Instrument applications for end-to-end observability implementing distributed tracing, metrics collection, and log aggregation across Node.js and .NET microservices and event-driven architectures.
- Troubleshoot complex incidents in production environments, diagnosing root causes across multiple service layers, databases, caches, and APIs under load using SLISLO frameworks.
- Investigate and resolve Azure Kubernetes Service (AKS) infrastructure, ensuring reliability and scalability of containerized workloads with deep proficiency in Terraform and Azure manag...