About OmniOpsOmniOps helps organizations build, operate, and optimize modern cloud-native infrastructure, observability platforms, and AI-powered operations. We work with clients across multiple industries to deliver scalable, reliable, and secure solutions that enable operational excellence.Position OverviewWe are seeking an experienced Observability Engineer to design, deploy, and operate observability platforms for our clients. The ideal candidate will have hands-on experience with the Grafana ecosystem, telemetry collection, monitoring, logging, tracing, and alerting across cloud, on-premises, and hybrid environments.This role involves working directly with client teams to ensure their infrastructure and applications are fully observable, while maintaining clear ownership boundaries and providing expert guidance on monitoring best practices.Key Responsibilities:Deploy, configure, upgrade, and manage Grafana Alloy agents across Kubernetes, Linux, and Windows environments.Design and maintain observability backends using Grafana, Loki, Mimir, Tempo, and Alertmanager, either on-premises or through Grafana Cloud.Configure telemetry pipelines for logs, metrics, and traces, ensuring reliable data collection and delivery.Manage retention policies, storage capacity, and platform health across client environments.Implement platform-level dashboards, alerting rules, notification channels, and monitoring standards.Ensure observability platforms are self-monitoring and capable of detecting ingestion, agent, and backend health issues.Support client teams with application instrumentation and observability best practices.Define telemetry collection standards, SDK recommendations, and integration approaches.Validate data ingestion and ensure logs, metrics, and traces are properly collected and visualized.Assist clients in implementing alerting strategies and translating SLO requirements into actionable monitoring solutions.Provide technical guidance while maintaining clear ownership boundaries between OmniOps and client teams.Required QualificationsProven experience operating Grafana-based observability platforms in production environments.Hands-on experience with Grafana Alloy, OpenTelemetry Collector, Promtail, Grafana Agent, or similar telemetry collection tools.Strong experience deploying and managing monitoring agents across Kubernetes, Linux, and Windows environments.Solid understanding of Kubernetes, including DaemonSets, RBAC, manifests, and Helm.Experience with Grafana Cloud, including tenant management, stack configuration, and agent connectivity.Strong understanding of observability concepts, including metrics, logs, traces, alerting, retention strategies, and capacity planning.Experience with PromQL, LogQL, and telemetry data modeling best practices.Strong communication and stakeholder management skills with a customer-focused mindset.Nice to HaveExperience delivering observability services in managed services or multi-tenant environments.Knowledge of Infrastructure as Code and GitOps tools such as Terraform, Helm, and Ansible.Experience implementing OpenTelemetry instrumentation for applications and services.Familiarity with AWS, Azure, and GCP logging integrations.Experience collecting and monitoring Windows telemetry, including Windows Exporter and Event Log collection.Experience building scalable monitoring and observability frameworks for enterprise environments.What Success Looks LikeReliable deployment and operation of observability platforms across multiple client environments.High availability and health of telemetry pipelines and monitoring infrastructure.Effective dashboards, alerting, and self-monitoring capabilities that improve operational visibility.Successful onboarding of client applications and infrastructure into observability platforms.Strong client satisfaction through technical excellence and proactive support.