SRE Engineer

Natobotics · Cairo, Egypt · Posted 2026-02-06

Job DescriptionKey Responsibilities : Monitor, maintain, and improve reliability, availability, and performance of enterprise applications and infrastructure. Implement ITSM processes such as incident, problem, and change management to ensure operational excellence. Identify and eliminate bottlenecks by developing automation and proactive monitoring solutions. Collaborate with development and infrastructure teams to ensure smooth deployment and reliable operation of applications. Participate in on-call rotations and shift operations, ensuring critical incident response and timely resolution. Conduct root cause analysis (RCA) for high-impact incidents and drive permanent fixes. Develop and maintain runbooks, standard operating procedures (SOPs), and service documentation. Gather metrics, generate performance reports, and support continuous improvement initiatives.Required Skills And Competencies Strong understanding of ITSM frameworks (preferably ITIL) and service operations for enterprise-scale environments. Experience in application monitoring, alerting, and observability tools (e.g., Prometheus, Grafana, Splunk, AppDynamics, or Dynatrace). Familiarity with cloud infrastructure (AWS, Azure, or GCP) and key DevOps/SRE practices. Proficiency in incident response, system troubleshooting, and performance optimization. Basic scripting or automation skills (Python, Shell, or PowerShell) for operational efficiency. Excellent collaboration and communication skills with a proactive problem-solving mindset.Willingness to work in rotational shifts and support 24×7 production environments.

Apply for this role