About Blnkblnk is a fintech company with a mission of enabling inclusion through point-of-sale financing. Less than 4% of Egyptians have access to credit cards and can only afford to purchase products/services with cash that they have saved or are forced to borrow from hard money lenders at high interest rates. We’re changing this by enabling all consumers to instantly receive credit within minutes at their favorite merchants.Role SummaryWe are seeking a DevOps Lead to own and evolve our on-premise infrastructure, strengthen our security posture, and drive reliability at scale. This role combines hands-on technical leadership with team management, focusing on resilient, observable, automated, and secure platforms that support critical fintech workloads.Key ResponsibilitiesInfrastructure ManagementDesign, deploy, and operate on-premise infrastructure OpenStack, bare metal, storage, networking) and hybrid integrations.Manage container platforms including Kubernetes, OpenShift, and Rancher for production workloads.Own capacity planning, cluster lifecycle (provisioning, upgrades, decommissioning), and day-to-day platform health.Ensure infrastructure resilience and high availability for critical fintech services.CI/CD Pipeline DevelopmentBuild and maintain secure CI/CD pipelines for application and infrastructure delivery, enforcing safe rollout patterns (canary, blue/green, feature flags).Integrate automated testing, security scans, and policy gates into pipelines to prevent regressions reaching production.Work with engineering teams to standardize deployment templates, images, and registries for reproducible releases.Monitoring & Incident ResponseDefine and implement observability strategy: metrics, distributed tracing, logs, and synthetic checks using tools like New Relic, Prometheus, and Grafana.Establish SLIs/SLOs and dashboards that align with business and product priorities.Lead incident response and on-call practices: runbooks, alerting thresholds, escalations, postmortems, and continuous remediation to lower MTTR.Automation & ScriptingOwn infrastructure-as-code (Terraform) and configuration management (Ansible) to enable repeatable provisioning and environment parity.Develop and maintain automation tooling and scripts (Python, Bash) for operational tasks, health checks, and self-healing workflows.Implement automated lifecycle operations (patching, certificate rotations, backups, cluster scaling) to reduce manual toil.Collaboration & CommunicationPartner with product, engineering, security, and compliance teams to align platform priorities with business goals.Drive platform adoption through developer enablement, shared libraries, and clear onboarding flows.Communicate incidents, platform changes, and roadmap updates to stakeholders; lead technical reviews and architecture discussions.Performance OptimizationMonitor and tune infrastructure and application performance CPU, memory, I/O, network) to meet SLOs and cost targets.Lead capacity forecasting, performance testing, and resilience exercises (chaos testing, failure injection).Identify and implement optimizations for resource efficiency and lower operational cost.DocumentationMaintain runbooks, standard operating procedures, architecture diagrams, and platform playbooks for common tasks and recovery scenarios.Ensure runbooks are actionable, versioned, and accessible to on-call and engineering teams.Document CI/CD patterns, security controls, and platform APIs for developer consumption.Security & ComplianceImplement and enforce infrastructure security controls: network segmentation, host/container hardening, firewall rules, and secure baseline configurations.Manage identity and access controls, RBAC for clusters, and least-privilege access on infrastructure and management interfaces.Own secrets management, certificate lifecycle, and secure artifact registries integrated with CI/CD.Integrate vulnerability scanning, dependency checks, and patch management into standard operations and release workflows.Support audits and compliance programs (SOC2, ISO 27001, FRA, NIST) by providing evidence, remediation plans, and operational controls.Required Qualifications7+ years of experience in DevOps, infrastructure, platform engineering, or SRE roles, with at least 2 years in a leadership capacity.Strong experience managing on-premise or hybrid environments, particularly OpenStackbased infrastructure.Deep expertise in container orchestration technologies such as Kubernetes, OpenShift, and Rancher.Proven experience with infrastructure-as-code and automation tools such as Terraform and Ansible.Strong understanding of distributed systems, networking, Linux administration, and reliability engineering.Hands-on experience with observability tools such as New Relic, Prometheus, Grafana, ELK, or similar.Practical experience implementing infrastructure security controls, including hardening, access controls, certificate management, and secret handling.Experience managing incident response, production operations, and post incident remediation.Strong scripting or programming skills, such as Python or Bash.Preferred QualificationsFintech Experience: Experience in fintech, payments, lending, or regulated financial environments.DevSecOps: Familiarity with DevSecOps practices, secure CI/CD pipelines, and integrating security tools SAST/DAST/SCA) into workflows.Security Frameworks: Knowledge of security frameworks and controls such as SOC 2, ISO 27001, NIST, FedRAMP, or local regulatory frameworks (e.g., FRA).Event-Driven Architectures: Exposure to Kafka, RabbitMQ, or other event driven messaging systems.Disaster Recovery: Experience with backup strategies, disaster recovery planning, and business continuity testing.Cloud Hybrid: Experience managing hybrid environments (on-premise + public cloud) with consistent security and observability.Vulnerability Management: Hands-on experience with vulnerability management platforms, security monitoring, and threat detection.Leadership & Coaching: Experience mentoring junior engineers, conducting technical interviews, and building high-performing teams.Certifications: Relevant certifications such as CKA/CKAD Kubernetes), Terraform Associate, AWS/Azure Security, or CISSP are a plus.What Success Looks Like (First 6–12 Months)Reliability: Established SLIs/SLOs with measurable improvements in system availability and reduced MTTR.Automation: Automated 80% of manual provisioning and routine operational tasks via IaC.Security: Platform passes compliance audits with no critical findings; security controls embedded in pipelines.Observability: Full observability coverage across infrastructure and applications with actionable alerts.Team Growth: Built a mature DevOps/SRE culture where reliability and security are embedded in day-to-day engineering