Back

AI Benchmark Engineer (Planning/Operations) (Dombivli)

19.2149 73.091
Dombivli, India
Posted: less than a week ago
Save
Share

Description

About Turing: Based in San Francisco, California, Turing is the world’s leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing supports customers in two ways: first, by accelerating frontier research with high-quality data, advanced training pipelines, plus top AI researchers who specialize in coding, reasoning, STEM, multilinguality, multimodality, and agents; and second, by applying that expertise to help enterprises transform AI from proof of concept into proprietary intelligence with systems that perform reliably, deliver measurable impact, and drive lasting results on the P&L; Role Overview: We are looking for AI Benchmark Engineers specializing in planning and operations to design and build complex, multi-agent benchmark tasks that simulate real-world planning, scheduling, and operational decision-making scenarios. This role focuses on creating constraint-rich problems that evaluate multi-agent reasoning, decomposition, and optimization capabilities in realistic environments. What does day-to-day life look like?
- Design and develop multi-agent benchmark tasks involving:
- Planning, scheduling, and resource allocation
- Operational decision-making (project management, logistics, incident response, capacity planning)
- Create constraint-rich problem statements with multiple interacting variables
- Develop verification scripts to evaluate:
- Feasibility (all constraints satisfied)
- Completeness (all requirements addressed)
- Optimality (efficient solutions)
- Build decomposition strategies:
- Split tasks across specialized sub-agents (resource-based, constraint-based, conflict resolution, optimization)
- Model real-world operational scenarios with dependencies, timelines, and resource constraints
- Collaborate on improving task quality, coverage, and evaluation rigor Requirements:
- 5+ years of experience in operations or project management or logistics or supply chain or AI research or a strong computer science research background
- Strong ability to formalize constraints, dependencies, and scheduling logic
- Proficiency in Python for building verification and validation scripts
- Strong structured problem-solving and decomposition skills
- Explicit and precise technical writing skills
- Experience with AI coding benchmarks (e.g., SWE-bench, Terminal-bench)
- Hands-on experience with Docker (Dockerfiles, image builds, debugging) Nice to have:
- Experience with optimization techniques (linear programming, constraint satisfaction, scheduling algorithms)
- Background in operations research
- Experience with simulation or modeling tools
- Knowledge of AI planning systems or automated reasoning
- Project management experience or certifications (PMP, Agile, etc.) Perks of Freelancing With Turing:
- Work in a fully remote environment.
- Opportunity to work on cutting-edge AI projects with leading LLM companies. Offer Details:
- Commitments Required: 40 hours per week with overlap of 4 hours with PST.
- Engagement Type: Contractor assignment (no medical/paid leave)
- Duration of Contract: 4 weeks (adjustable based on engagement) Apply on Kit Job: kitjob.in/job/4mvmk1

Highlights

Company name

Turing
Job position

AI Benchmark Engineer (Planning/Operations) (Dombivli)

Ad ID:

8800045132
Flag
Block ad

Safety Tips

Protect your personal details and initiate communication using our contact form.