AI Evaluation Engineer (Knowledge & Research)

Gramian Consulting · Posted 2026-05-04

About UsGramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.Role OverviewWe are looking for an AI Evaluation Engineer with a strong research background to design and evaluate complex, multi-agent tasks used to benchmark next-generation AI systems.In this role, you will work at the intersection of research, data structuring, and AI evaluation, building high-quality tasks that require deep document understanding, structured reasoning, and multi-step synthesis. You will create datasets and evaluation frameworks that test whether AI agents can truly read, reason, and extract knowledge from large-scale unstructured data.This is a high-precision, detail-oriented role requiring strong analytical thinking, structured problem decomposition, and the ability to translate research content into measurable evaluation tasks.Commitments Required: 8 hours per day with an overlap of 4 hours with PST. Employment type: Contractor assignment (no medical/paid leave)Duration of contract: 5 weeks+Location: Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, VietnamInterview: take home assessment (60min) ResponsibilitiesBuild multi-agent benchmark tasks that require reading, analyzing, and synthesizing large document collectionsCurate real-world research corpora — academic papers, case studies, technical reports — and design questions that require comprehensive analysisWrite structured ground-truth oracles (JSON) with specific, verifiable answers that prove the agent actually read the source materialDesign LLM judge prompts that evaluate agent output field-by-field against the oracleCreate decomposition guides that split research across multiple parallel sub-agents (one per document, one per domain, then synthesis)Requirements5+ years of experience in research (academic or industry) in a scientific, technical, or analytical domain Strong ability to read, analyze, and extract structured information from unstructured documents Experience designing or working with structured data formats (JSON, schemas, validation) Proficiency in Python scripting (data processing, validation, or evaluation scripts) Experience with AI evaluation, coding benchmarks, or structured reasoning tasks (e.g., SWE-bench, Terminal-bench, or similar) Experience working with Docker (building images, debugging containers) Strong attention to detail, especially when defining exact, verifiable outputs Ability to design complex, multi-step problem-solving workflows

Apply for this role

Other open roles at Gramian Consulting

Cybersecurity Experts for AI Training
Gramian Consulting
AI Training in ARABIC
Gramian Consulting
AI Training in Chinese
Gramian Consulting
Advanced Mathematics Consultant - AI Training
Gramian Consulting
AI Evaluation Engineer - Software Engineering Domain
Gramian Consulting

See all 30 open roles at Gramian Consulting →

Related jobs in Data & Analytics

Analytics & AI Architect - Freelance - Cairo
Infomineo · Cairo
AI Support Notification Specialist
Platinumlist
Senior Product Designer - AI Start Up (Sheikh Zayed)
KNOT Technologies · Al Jizah
Identity & Access Management (IAM) Analyst
Alignerr · Cairo
Data Governance Lead
Intellias · Cairo

About Gramian Consulting

IT Services and IT Consulting

We get talents. You get results.

Visit the Gramian Consulting hub on Take-Off →