Site Reliability Engineer (SRE)

Confidential · Cairo, Egypt · Posted 2026-03-04

Main Duties & Responsibilities :-Design, implement, and operate scalable and reliable Kubernetes infrastructureAutomate infrastructure provisioning and configuration using Terraform, Ansible, and Helm.Build, maintain, and improve CI/CD pipelines using GitHub Actions and ArgoCD.Continuously improve system reliability, performance, and operational efficiency Basic Qualification.Manage and optimize AWS cloud environments for performance, reliability, and cost.Develop automation tools and scripts using Python and Bash.Monitor system health, proactively identify issues, and reduce downtime.Lead and participate in incident response, troubleshooting, and root cause analysis.Continuously improve system reliability, performance, and operational efficiency. Basic Qualification :-Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience • Relevant cloud or DevOps certifications (AWS, Kubernetes, etc.) are a plus.Experience:10+ years of overall IT experience3+ years of hands-on experience in SRE / DevOps roles • Proven experience supporting and operating high availability production systems.Knowledge and Experience :-Strong expertise in Site Reliability Engineering (SRE) principles, including availability, scalability, performance, and resilience.Experience managing production-grade systems at scale.Deep hands-on knowledge of Kubernetes and cloud native architectures.Solid understanding of AWS cloud services and best practices for security, cost optimization, and performance.Practical experience with infrastructure as code and configuration management.Strong knowledge of CI/CD pipelines, GitOps, and automated deployment strategiesExperience with monitoring, logging, alerting, and observability tools.Proven experience handling incident response, root cause analysis, and post-mortems.Skills:-Kubernetes (cluster design, scaling, upgrades, troubleshooting)Infrastructure automation using Terraform, Ansible, and HelmCI/CD tools such as GitHub Actions and ArgoCDAWS services (EC2, EKS, VPC, IAM, S3, etc.)Scripting and automation using Python and BashMonitoring and observability (Prometheus, Grafana, CloudWatch, or similar tools)Strong troubleshooting and debugging skillsExcellent collaboration and communication skillsApplicants outside Cairo are welcome to apply

Apply for this role