Integrant, Inc. · Cairo, Egypt · Posted 2026-04-08
Integrant is looking for game changers to join our team as " Lead AI Platform".The Lead AI Platform Engineer is responsible for bridging AI workloads with production-grade infrastructure, with a strong focus on NVIDIA AI stack, enabling high-performance, scalable, and optimized AI systems.This role focuses on model optimization, runtime efficiency, and GPU utilization, ensuring that AI workloads are production-ready, cost-efficient, and performant across enterprise environments.Roles and Responsibilities:Translate AI/ML workloads into optimized infrastructure and deployment strategiesOptimize model performance across GPU environments (latency, throughput, memory utilization)Design and implement inference and training pipelines using NVIDIA stack tools (TensorRT, Triton, NIM)Convert and optimize models across frameworks (PyTorch → ONNX → TensorRT)Analyze and resolve performance bottlenecks using profiling tools (GPU, memory, network)Improve GPU utilization and scheduling efficiency across clustersDesign scalable distributed training and inference architecturesWork closely with customers to define AI infrastructure strategies and deployment modelsSupport production deployments including monitoring, rollback, and performance validationConduct applied research to improve model efficiency and infrastructure utilizationMentor team members on AI infrastructure, optimization, and GPU systemsExperiment tracking tools (MLflow, W&B, Neptune) log parameters, metrics, and artifacts for comparisonFind the Model degradation happens post-deployment: concept drift, data pipeline changes, traffic pattern shiftsRoot cause analysis (RCA) applies to ML systems: isolating variables, reproducing issuesRequirements8+ years of experience in AI systems8+ years of experience in ML systems, HPC and AI infrastructureStrong proficiency in PythonStrong experience with GPU-based AI workloads and performance optimizationDeep understanding of model optimization techniques (quantization, pruning, batching)Hands-on experience with:PyTorchONNX / ONNX RuntimeTensorRT / TensorRT-LLMTriton Inference ServerKnowledge of CUDA, cuDNN, and GPU architecture fundamentalsExperience with distributed systems (multi-GPU / multi-node)Familiarity with:NCCL communicationNVLink / InfiniBandKubernetes or Slurm for orchestrationExperience deploying AI models into production environmentsAbility to analyze system bottlenecks (compute, memory, network)Experience with profiling tools (Nsight, TensorRT profiler, etc.)Knowledge of cost optimization strategies for GPU workloadsExperiment tracking tools (MLflow, W&B, Neptune) log parameters, metrics, and artifacts for comparisonFind the Model degradation happens post-deployment: concept drift, data pipeline changes, traffic pattern shiftsRoot cause analysis (RCA) applies to ML systems: isolating variables, reproducing issuesNice to HaveExperience with NVIDIA NIM and NGC ecosystemExposure to Megatron-LM, NeMo, or large-scale LLM training/inferenceExperience with LLM optimization techniques (KV cache, batching strategies)Familiarity with MLOps practices and CI/CD for AI systemsExperience in customer-facing architecture or consulting rolesFamiliarity with hybrid cloud / on-prem HPC environmentsBenefitsSalary paid in USDSix-month career advancing opportunitiesSupportive and friendly work environmentPremium medical insurance [employee +family]English language development coursesInterest-free loans paid over 2.5 yearsTechnical development coursesPlanned overtime program (POP)Employment referral programPremium location in MaadiSocial insurance