2P Perfect Presentation · Al Jizah, Egypt · Posted 2026-05-10
About the RoleWe are hiring a Mid-Level ML DevOps Engineer to bridge our DevOps practice with our growing AI/ML initiatives. You will own the operational backbone of our ML and LLM systems — from training pipelines and experiment tracking through to production deployment, serving, and monitoring. You will work closely with Data Scientists, AI Engineers, and Platform Engineering to ensure models move from experimentation to production reliably, securely, and at scale.Key ResponsibilitiesDesign, build, and maintain CI/CD pipelines for ML model training, validation, and deployment.Operate and optimize LLM inference engines (e.g., vLLM, TGI, TensorRT-LLM, Ollama) for performance, cost, and reliability.Own the end-to-end AI deployment lifecycle: data prep → training → registration → serving → monitoring → retraining.Implement data and model versioning workflows (e.g., DVC, LakeFS, Git-LFS).Operate experiment tracking tools (MLflow, Weights & Biases, or Comet) and drive consistent usage across teams.Maintain the model registry: versions, metadata, lineage, and lifecycle stages (staging, production, archived).Deploy models as scalable APIs using FastAPI, Flask, Seldon Core, or KServe.Set up ML observability for data drift, concept drift, and prediction latency using Evidently AI, Prometheus, and Grafana.Build and orchestrate data pipelines using Apache Airflow (or equivalents).Operate a feature store (e.g., Feast) for consistent features across training and inference.Collaborate with security and compliance to ensure ML systems meet organizational and regulatory requirements (e.g., PDPL where applicable).RequirementsRequired QualificationsExperience & Education3–5 years of professional DevOps / SRE / Platform Engineering experience.1–2 years of hands-on exposure to ML/AI infrastructure or MLOps work.Bachelor's degree in Computer Science, Software Engineering, or related field (or equivalent practical experience).Core DevOps (Mid-Level)Containerization and orchestration: Docker, Kubernetes.CI/CD: GitHub Actions, GitLab CI, Jenkins, or Argo CD.Infrastructure as Code: Terraform, Ansible, or Pulumi.Linux administration, networking fundamentals, Git workflows.ML/AI DevOpsWorking knowledge of LLM inference engines and how to run them in production.Solid grasp of the AI deployment lifecycle from training through monitoring.Hands-on with at least one experiment tracking tool: MLflow, Weights & Biases, or Comet.Model deployment experience using FastAPI, Flask, Seldon Core, or KServe.Understanding of data and model versioning practices.Experience with model registry concepts and at least one implementation.ML observability: monitoring data drift, concept drift, and prediction latency with Evidently AI, Prometheus, and Grafana.Data pipelines and orchestration with Apache Airflow (or similar).Feature stores: Feast or equivalent.CloudWorking knowledge of at least one: AWS (SageMaker), GCP (Vertex AI), or Azure ML.ProgrammingStrong Python, including ML libraries (PyTorch and/or TensorFlow).Strong Bash scripting for automation.