Position: SwarmBench Task Engineer Data AnalysisType: Short-Term Contract (4 weeks)Compensation: $15 per hourLocation: RemoteCommitment: 30-40 hours per week with 4 hours overlap with PSTRole ResponsibilitiesDesign and author multi-agent benchmark tasks centered on complex data analysis workflowsCreate realistic synthetic datasets or curate real-world style datasets across domains such as finance, operations, security, or market analysisBuild tasks that require agents to perform cross-referencing, anomaly detection, contradiction identification, and statistical computation across multiple sourcesDevelop decomposition guides that split analytical work across specialist sub-agents such as financial, technical, security, or operations analystsWrite precise oracle logic or verification scripts that validate specific analytical conclusions rather than generic summariesCreate reproducible evaluation environments using Python and DockerReview task performance signals to ensure strong separation between weaker and stronger agentic systemsRefine tasks to improve determinism, clarity, difficulty, and scoring qualityRequirementsStrong years of experience in data analysisStrong proficiency in SQL and Python for data analysis and scripting (pandas, NumPy, or similar)Experience working with real-world, messy datasets such as CSV, JSON, logs, and reportsAbility to design non-trivial analytical questions with clear, specific, and verifiable answersSolid understanding of statistical concepts including averages, distributions, outliers, and correlationsFamiliarity with AI coding benchmark environments (e.g., SWE-bench, Terminal-Bench)Comfortable with Docker including writing Dockerfiles, building images, and debugging container issuesAbility to work independently in a remote environmentApplication ProcessApply/Easy Apply and check email for application formFill Google formAssessment Link (After shortlisting to be completed within 24 hours)