Senior DevOps Engineer

Vertex Technologies · Cairo, Egypt · Posted 2026-06-09

Responsibilities:Designing and deploying scalable, multi-tenant cloud (AWS or Azure) and hybrid/on-premises architectures tailored to diverse client needs, including specialized infrastructure for AI and machine learning workloads.Understanding business requirements, evaluating architectural trade-offs, and translating them into cost-effective, production-ready technical solutions.Developing declarative scripts and modules for automating infrastructure provisioning, configuration management, and environment replication.Designing, implementing, and optimizing GitOps-driven CI/CD pipelines to achieve automated, self-healing software delivery cycles for both application code and AI assets (models, prompts, and evaluation datasets).Building and maintaining comprehensive observability (monitoring, logging, and tracing) systems to ensure proactive anomaly detection, including tracking LLM performance metrics (latency, token usage, and drift).Ensuring system security and protection by integrating security guardrails (SAST/DAST, container scanning, prompt injection defense, and data anonymization) directly into the delivery pipeline (DevSecOps).Designing and implementing robust disaster recovery (DR), failover procedures, and high-availability strategies across multi-region setups.Automating the deployment of system updates, patches, and zero-downtime microservices and AI model endpoint releases.Adhering to corporate security, data privacy (GDPR/HIPAA/SOC2), and industry-standard regulatory rules and compliance practices, with specific guardrails for AI data handling and model usage.Providing technical leadership, architectural guidance, and mentorship to developers, data scientists, system engineers, and cross-functional client teams.Supporting team infrastructure, unblocking development workflows, and rapidly resolving complex configuration, network, and automation issues across multi-cloud environments.Ensuring high availability, scalability, elasticity, and maximum resilience against infrastructure and service component failures.Staying up-to-date on the latest cloud-native technologies, CNCF ecosystem projects, FinOps (specifically managing unpredictable cloud AI/GPU spend), and industry best practices to drive continuous innovation.What We Offer:Long-term career stability with a competitive salary paid in USD.Conditions for steady career development.Development supported by dedicated mentors and a variety of programs focused on expertise and innovation.Private medical insurance provided after successful completion of the probationary periodA well-equipped and cozy office supports comfort and productivity across all project stages.Welcoming atmosphere and a friendly corporate culture. What we expect from you:Practical administration experience with Linux/UNIX and Windows systems (mandatory, at least 3+ years in a senior or lead capacity).Strong understanding of modern web architectures, microservices, distributed systems, and networking protocols.Practical experience in DevOps/Platform Engineering roles involving end-to-end infrastructure development and client-facing delivery (mandatory, at least 3+ years).Practical database administration and optimization experience with relational, non-relational (NoSQL), and vector databases (e.g., Pinecone, Milvus, Qdrant, or pgvector) used in AI applications.Deep understanding and production experience with Infrastructure as Code (IaC) principles, focusing on modularity, reusability, and state management.Automation experience with enterprise configuration management tools like Ansible, or modern alternatives/code-driven IaC (e.g., Pulumi).Experience in designing, deploying, and managing environments in AWS or Azure using advanced, automated GitOps/IaC workflows (Terraform, OpenTofu, CloudFormation, or Bicep/ARM).Practical skills in automating code compilation, artifact management, and continuous deployment using GitHub Actions, GitLab CI, Jenkins, or cloud-native tooling (ArgoCD, Flux).Experience implementing automated code testing and compliance shifts in the CI process, extending to continuous evaluation pipelines for LLM-backed applications (using frameworks like Ragas or Langfuse).Proficiency in containerization and cloud-native orchestration using Docker, Kubernetes (EKS/AKS), Helm, and ingress management.Experience deploying, scaling, and managing service meshes, microservices releases, and containerized AI model deployment frameworks (e.g., vLLM, Triton Inference Server, Hugging Face TGI).Experience with enterprise artifact repository managers (JFrog Artifactory, Nexus, or cloud-native container registries).Advanced scripting and programming skills in Python (essential for AI ecosystems), Bash, Go, or PowerShell for building custom automation tools.Expert knowledge of Git, including advanced branching strategies (GitFlow, Trunk-Based Development), repository management, and managing version control for application code, configuration, and prompt templates.Proven experience implementing DevSecOps, secrets management (HashiCorp Vault, AWS Secrets Manager), and identity access management (IAM).Experience implementing cloud financial management (FinOps), with a strong focus on tracking and optimizing high-cost AI infrastructure and API token spending.Great communication and consultancy skills, with the ability to articulate technical concepts clearly to both technical teams and non-technical client stakeholders.Will be a plus:Deep knowledge of Linux/Windows OS internals, low-level troubleshooting, kernel tuning, and advanced performance diagnostics.Deep knowledge of Enterprise Networking (VPC peering, SD-WAN, VPNs) and Cloud Security Architecture (Zero Trust models, WAF, DDoS mitigation).Experience in the end-to-end design, business justification, documentation, and implementation of complex, large-scale enterprise architectural solutions.Hands-on experience building or operating Retrieval-Augmented Generation (RAG) pipelines and managing LLM-backed agent orchestration frameworks (e.g., LangChain, AutoGen).Active professional-level certifications (e.g., AWS Certified Solutions Architect Professional, Azure Solutions Architect Expert, CKA/CKAD, or cloud AI/Machine Learning specializations).

Apply for this role

Other open roles at Vertex Technologies

See all 36 open roles at Vertex Technologies →

Related jobs in Software & IT

On Take-Off

  • 1 candidate applied to Vertex Technologies on Take-Off in the last 30 days.
  • 1 apply-button click across their roles in the same period.