Job SummaryThe Senior AI Data Engineer is responsible for designing, building, and optimizing enterprise-scale data and AI infrastructure to support machine learning models, generative AI applications, and real-time analytics. The role drives the development of end-to-end data pipelines, from ingestion to production-ready AI data products, ensuring scalability, performance, and compliance across multi-cloud environments.Accountability & Responsibilities Design, build, and maintain scalable ETL/ELT data pipelines using modern data engineering tools (e.g., Apache Spark, dbt). Architect and implement Lakehouse data platforms (Delta Lake, Apache Iceberg, Apache Hudi) following Medallion architecture (Bronze/Silver/Gold). Develop real-time streaming pipelines using Apache Kafka, Apache Flink, and Spark Structured Streaming. Build and optimize AI/GenAI data pipelines for LLM training, fine-tuning, and inference (tokenization, dataset curation, prompt engineering datasets). Design and implement Retrieval-Augmented Generation (RAG) pipelines, including embedding workflows and vector database integration. Manage feature stores for real-time and batch machine learning use cases. Integrate data pipelines with AI/ML platforms (Databricks MLflow, Azure ML, AWS SageMaker, Vertex AI, OpenAI/Azure OpenAI). Implement data orchestration workflows using Apache Airflow or similar tools with CI/CD pipelines. Ensure data quality, governance, and security using frameworks such as Great Expectations and data catalog tools. Deploy and manage infrastructure using Infrastructure-as-Code tools (Terraform, Bicep, CDK). Collaborate with Data Scientists, ML Engineers, and Solution Architects to deliver production-ready AI solutions. Lead technical design decisions, mentor junior engineers, and contribute to data platform strategy. Maintain documentation, data contracts, and operational runbooks for all pipelines.Requirements1 – Required Experience Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field. 4–5 years of experience in data engineering, with strong exposure to AI/ML data infrastructure. Proven experience building scalable data pipelines and working with large-scale datasets. Hands-on experience with AI/ML platforms and modern data architectures. Experience in regulated industries (e.g., Banking, Telecom, Healthcare) is a plus. Strong problem-solving, analytical thinking, and communication skills. Experience working in cross-functional teams and agile environments.2– Technical Skills Strong SQL and advanced data modeling techniques Apache Spark (PySpark, Spark SQL, Streaming) Python (pandas, PySpark, data processing libraries) Data pipeline orchestration (Apache Airflow) CI/CD for data pipelines (GitHub Actions / Azure DevOps) Lakehouse architectures (Delta Lake / Iceberg / Hudi) Streaming technologies (Kafka, Flink) Cloud platforms (AWS / Azure / GCP) Vector databases (Pinecone, Weaviate, pgvector, OpenSearch) RAG pipeline design and LLM data processing Infrastructure-as-Code (Terraform / Bicep / CDK) Containers (Docker, Kubernetes) Data quality & governance tools
Established in 1996, Interact Technology Solutions is a leading system integrator in Egypt, delivering IT solutions, consulting, outsourcing, and professional services for nearly 30 years. Headquartered in Zahraa El-Maadi, Cairo, we serve over 5,200 clients across 35 industries, with revenues exceeding $35M USD from 2020. Interact employs 350+ certified professionals, including engineers, solution… read more