Dewan Architects and Engineers · Cairo, Egypt · Posted 2026-03-09
Job Purpose:The AIOps Engineer will help automate and optimize IT support processes through AI-driven solutions. This role is pivotal in advancing our IT service management capabilities by integrating machine learning, automation, and advanced analytics into our operations.Responsibilities:AIOps Implementation & Management:Lead the implementation and ongoing optimization of AIOps platforms and tools to automate incident detection, resolution, and root cause analysis.Use machine learning models to improve anomaly detection and predict issues before they impact business operations.Build and maintain workflows for automating repetitive IT support tasks, including monitoring, alerting, incident response, and ticketing.Automation & Incident Management:Develop automation scripts and integration workflows to minimize manual intervention in IT operations.Leverage AIOps to auto-remediate common issues, reducing time-to-resolution and enhancing system uptime.Collaborate with cross-functional teams to integrate AIOps solutions into incident and change management processes.Data Analysis & Reporting:Analyze historical IT operation data to identify areas for automation and performance optimization.Build and maintain dashboards and reports to track key metrics such as Mean Time to Detect (MTTD), Mean Time to Resolve (MTTR), and incident volume.Implement predictive analytics to forecast trends and proactively address IT issues before they escalate.Collaboration & Process Improvement:Work closely with DevOps, Network, and Systems teams to ensure seamless integration between AIOps tools and existing IT management platforms.Identify and implement continuous improvements in IT service management processes using AIOps technologies.Stay current on industry trends, tools, and best practices for AIOps, automation, and machine learning.Knowledge, Skills & Abilities: Technical Skills:Strong understanding of AIOps tools (e.g., Moogsoft, BigPanda, Splunk, Dynatrace, etc.) and ITOM platforms (e.g., ServiceNow, BMC Helix).Experience with automation tools and frameworks (e.g., Ansible, Puppet, Chef, or custom scripts).Proficient in programming languages such as Python, Go, or Bash for automation tasks.Familiarity with machine learning and AI concepts, especially related to predictive analytics, anomaly detection, and data classification.Hands-on experience with cloud platforms (AWS, Azure, Google Cloud) and IT infrastructure management.Operational Expertise:Strong background in IT operations, ITIL processes, and incident management best practices.Experience in building and optimizing IT service management workflows with an emphasis on reducing manual effort and improving system reliability.Excellent understanding of monitoring and alerting systems (e.g., Nagios, Prometheus, New Relic) and incident management tools (e.g., ServiceNow, JIRA).Analytical & Problem-Solving:Data-driven mindset with strong analytical skills to mine, interpret, and present operational data.Ability to diagnose complex problems, identify patterns, and implement corrective actions based on AI insights.Collaboration & CommunicationStrong communication skills, with the ability to work effectively with technical and non-technical stakeholders.Proven track record of cross-functional collaboration in a fast-paced, dynamic environment. QualificationsBachelor’s or Master’s degree in Computer Science, Information Technology, or related field.Relevant certifications (e.g., ITIL, AWS Certified Solutions Architect, DevOps, etc.).Experience with cloud-native application monitoring and automation.Familiarity with Agile/Scrum methodologies and DevOps practices.