Observability Engineer

Konecta · Cairo, Egypt · Posted 2026-04-19

Observability EngineerWhy We Need This Role: - The platform requires comprehensive observability dashboards built from multiple sources (platform health, use case performance, cost, security) - The platform needs advanced monitoring beyond GCP native tools for production operations - Prometheus and Grafana expertise required for custom metrics, alerting, and dashboards - Open Telemetry instrumentation across all use cases requires dedicated focus - No current team member has deep Prometheus/ Grafana expertiseJob Description: Observability EngineerReports To: (Chief Engineer) About the Role:Our GenAI platform requires comprehensive observability to ensure production reliability, performance optimisation, and cost management. As our Observability Engineer, you will design and implement the monitoring, alerting, and dashboarding infrastructure that gives teams visibility into platform health, use case performance, and operational costs.Key Responsibilities:Design and implement observability architecture using Prometheus and GrafanaDeploy and manage Prometheus stack on GKE with appropriate retention andHA configurationCreate comprehensive Grafana dashboards for platform health, API performance, and use case metricsImplement custom metrics collection for CrewAI agents, Kong API Gateway, and LLM usageConfigure OpenTelemetry instrumentation across all platform servicesDesign alerting rules and notification channels for P0-P3 incident severity levels Build cost and usage dashboards for LLM token consumption and infrastructure spendIntegrate with Cloud Monitoring and Cloud Logging for unified observabilityEstablish SLI/SLO frameworks for platform and use case servicesCreate runbooks for common alerting scenarios and incident responseRequired Skills:4+ years experience in observability and monitoring engineeringStrong expertise in Prometheus (PromQL, recording rules, alerting rules)Proficiency in Grafana (dashboard design, variables, annotations, alerting)Experience with OpenTelemetry for distributed tracing and metricsKnowledge of Kubernetes monitoring patterns and kube-state-metricsUnderstanding of SRE principles (SLIs, SLOs, error budgets)Experience with log aggregation and analysis (Loki, ELK, or similar) Familiarity with alerting best practices and on-call workflows Desirable Skills:Experience with GCP Cloud Monitoring and Cloud Trace integration Knowledge of AI/ML observability patterns (model latency, token usage, drift detection)Background in API gateway monitoring (Kong, Envoy, or similar)Experience with long-term Prometheus storageFamiliarity with FinOps and cost observability dashboards

Apply for this role

Other open roles at Konecta

Cloud Security Lead
Konecta · Cairo
Storage & Backup Specialist
Konecta · Cairo
Microsoft SQL Server Specialist
Konecta · Cairo
Firewall Specialist
Konecta · Cairo
RTA
Konecta · Cairo

See all 30 open roles at Konecta →

Related jobs in Engineering & Construction

Mechanical Technician
ElMonsef Automotive · Al Jizah
Sales Engineer
AIC - ATLANTIC INTERNATIONAL CORP. · Al Jizah
Senior Design Engineer ICT/ AV
AESG
HSE Manager (Health, Safety & Environment Manager)
Premier Services and Recruitment · Sharkia
Senior Process Engineer
ISS INTERNATIONAL SpA · Cairo

About Konecta

Technology, Information and Internet

Konecta es una empresa que brinda soluciones de estrategia digital, diseño web, desarrollo web y marketing digital. Tanto para empresas, negocios, figuras publicas y emprendedores. poniendo al servicios todas nuestra habilidades y experiencia para optimizar al máximo tu presupuesto

Visit the Konecta hub on Take-Off →