Position: SwarmBench Task Engineer Knowledge / ResearchType: Short-Term Contract (4 weeks)Compensation: $15 per hourLocation: RemoteCommitment: 8 hours per day with 4 hours overlap with PSTRole ResponsibilitiesBuild multi-agent benchmark tasks requiring deep reading, analysis, and synthesis of large document collectionsCurate real-world research datasets (academic papers, case studies, technical reports) for AI evaluationDesign complex research-driven questions requiring cross-document reasoning and synthesisCreate structured ground-truth outputs (JSON) with precise, verifiable answersDevelop LLM judge prompts to evaluate outputs against defined schemas and oraclesDesign decomposition strategies to split research tasks across multiple parallel agentsAnalyze model outputs and ensure correctness, completeness, and factual groundingWork with agentic frameworks and evaluation pipelines for AI benchmarkingRequirementsStrong experience in research (academic or industry) across any scientific domainStrong reading comprehension with the ability to extract structured insights from unstructured dataExperience with JSON/data structures (schema design, validation, structured outputs)Proficiency in Python for scripting, data processing, or evaluation workflowsFamiliarity with AI coding benchmarks (SWE-bench, Terminal-bench, etc.)Comfortable with Linux/terminal workflows, Git, and development toolsExperience with Docker (Dockerfiles, building images, debugging containers)High attention to detail, especially for creating precise, verifiable outputsApplication ProcessApply / Easy Apply via LinkedInFill out the application form shared via emailComplete the assessment (post-shortlisting; to be completed within 24 hours)