Site Reliability Engineer ( SRE ) - English Speakers

LIPS Healthcare · Cairo, Egypt · Posted 2026-05-06

Role Summary:At LIPS Healthcare, the Site Reliability Engineer (SRE) plays a pivotal role in ensuring that our patient- and public-facing website, as well as internal tools, run reliably, securely, and efficiently. The SRE bridges the gap between development and operations, combining software engineering practices with operational excellence. This role is responsible for building, maintaining, and automating systems that enable high availability, scalability, and resilience across digital healthcare platforms.The SRE also drives application security, operability and observability practices across platforms, ensuring that our services can be delivered with minimal disruption and maximum safety.Principal Activities/Main Duties and ResponsibilitiesDesign, build, and maintain CI/CD pipelines ensuring safe, frequent, and zero-downtime deployments.Collaborate with software engineers and IT leadership to define availability targets, SLIs, and SLOs for LIPS Healthcare services.Implement and maintain observability practices (monitoring, logging, tracing, and alerting).Use the Four Golden Signals (latency, traffic, errors, saturation) to monitor system health and reliability.Manage cloud infrastructure (Azure) and container orchestration platforms (Kubernetes, Docker).Automate infrastructure provisioning using tools such as Terraform or Bash.Ensure operability standards: high availability, disaster recovery, and incident response readiness.Actively participate in on-call rotations to support production services, following a You Build It, You Run It model.Lead incident reviews and post-mortems, embedding lessons learned into system improvements.Mentor IT and engineering staff in automation, reliability, and operational best practices.Administer and troubleshoot networking systems, ensuring secure and reliable connectivity for internal tools and patient-facing platforms.ManagementImplement and monitor system availability targets aligned with business needs.Anticipate risks in system design and proactively propose mitigation strategies.Demonstrate expertise in project planning, automation, and operational monitoring.LeadershipPromote a reliability-first and operability-focused culture.Foster collaboration across development, operations, and business teams.Lead incident reviews with transparency, accountability, and a focus on learning.Relationship BuildingEstablish strong partnerships across IT, clinical teams, and external vendors.Share knowledge to upskill teams in modern DevOps and SRE practices.Business Acumen & Enterprise KnowledgeAlign reliability initiatives with LIPS Healthcare’s mission of safe, continuous patient care.Balance system availability with delivery speed and cost efficiency.Change AdvocacyChampion automation and observability as key drivers of healthcare transformation.Promote continuous improvement through adoption of SRE principles and practices.InfluencingCommunicate complex technical concepts clearly to both technical and non-technical stakeholders.Secure buy-in for reliability initiatives across IT and business leadership.Results OrientationDeliver measurable improvements in system uptime, performance, and incident response.Ensure deployments are frequent, safe, and reliable.Operational Excellence AreasHealth, Safety & SecurityEnsure system reliability meets healthcare safety standards.Safeguard sensitive patient data in compliance with data protection regulations.Promote secure coding, networking, monitoring, and operational practices.Service ImprovementContinuously enhance deployment pipelines, infrastructure, and observability tools.Evaluate and adopt new technologies that improve resilience and performance.Contribute to broader IT transformation initiatives.Quality & ComplianceUphold quality standards across clinical and non-clinical systems.Support audits and compliance activities for IT systems.Monitor and review reliability metrics against agreed SLIs and SLOs.Education and TrainingBachelor’s or Master’s degree in Computer Science, Information Systems, or a related field.Proven experience as an SRE, DevOps Engineer, or Systems Engineer in complex environments.Certification in cloud platforms (AWS, Azure, or GCP), Kubernetes, or Networking.Working Conditions:Working Days: Monday to Friday ( Saturday & Sunday off ).Working Hours: 11:00AM to 7:00PM.Work Location: Heliopolis, Cairo.

Apply for this role