AI Evaluation Engineer - Mathematics & Algorithms

Gramian Consulting · Posted 2026-04-27

About UsGramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.Role OverviewWe are looking for a highly analytical and computationally strong professional with a solid research background in mathematics or quantitative fields.In this role, you will design advanced benchmark tasks for multi-agent AI systems, focusing on complex mathematical reasoning, algorithmic problem-solving, and verifiable computational outputs. You will contribute by crafting challenging problems, building validation systems, and structuring tasks that require decomposition into coordinated sub-solutions.Commitments Required: 8 hours per day with an overlap of 4 hours with PST. Employment type: Contractor assignment (no medical/paid leave)Duration of contract: 4 weeks+Location: Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, VietnamInterview: take home assessment (60min) + short interviewResponsibilitiesDesign and build multi-agent benchmark tasks requiring multi-step mathematical reasoning and algorithmic problem-solving Create complex, decomposable problems across domains such as: Competition mathematics Numerical analysis Combinatorial optimization Statistical inference Develop verification scripts to validate: Numerical outputs (with tolerance thresholds) Proof correctness and logical steps Algorithmic outputs and constraints Write clear, structured problem statements with precise notation and defined outputs Design task decomposition strategies for parallel or multi-agent execution Implement computational solutions and validation pipelines using Python Work with containerized environments (Docker) for reproducibility and evaluationRequirements5+ years in mathematics, quantitative research, or computational science — competition math, university-level mathematics, or quantitative research background Python programming — NumPy, SciPy, or symbolic computation (SymPy) Experience writing mathematical proofs or formal derivationsAbility to create problems with precise, verifiable answers — not subjective or open-endedExperience with AI coding benchmarks (SWE-bench, Terminal-bench) Comfortable with Docker — writing Dockerfiles, building images, and debugging container issuesUnderstanding of numerical methods — floating point tolerance, convergence criteria, error boundsNice to Have Experience creating competition math problems (AMC, AIME, Putnam, IMO) Background in theoretical computer science or advanced mathematics research Exposure to automated theorem proving or formal verification Familiarity with AI reasoning benchmarks (GSM8K, MATH, AIME, GPQA, ARC-AGI) Experience in large-scale numerical or scientific computing

Apply for this role

Other open roles at Gramian Consulting

AI Training in ARABIC
Gramian Consulting
AI Training in Chinese
Gramian Consulting
Advanced Mathematics Consultant - AI Training
Gramian Consulting
AI Evaluation Engineer - Software Engineering Domain
Gramian Consulting
Senior CAD & Simulation Engineer (Aerospace)
Gramian Consulting

See all 29 open roles at Gramian Consulting →

Related jobs in Data & Analytics

Italian Audio Evaluations Specialist - Freelance AI Trainer Project
Meridial Marketplace, by Invisible
French Audio Evaluations Specialist - Freelance AI Trainer Project
Meridial Marketplace, by Invisible
Project Manager – Data, Analytics & AI
QNBeyond Plus · Cairo
Quality Analyst (German Speaking)
Capgemini · Cairo
Trans Planning Data Analyst, Supply Chain Transportation Planning
Amazon · Cairo

About Gramian Consulting

IT Services and IT Consulting

We get talents. You get results.

Visit the Gramian Consulting hub on Take-Off →