AI Benchmark Engineer (Data Analysis) (Gurugram)
-
Gurugram, India
-
Posted: yesterday
-
Save
- Design and author multi-agent benchmark tasks centered on complex data analysis workflows
- Create realistic synthetic datasets or curate real-world style datasets across domains such as finance, operations, security, or market analysis
- Build tasks that require agents to perform cross-referencing, anomaly detection, contradiction identification, and statistical computation across multiple sources
- Develop decomposition guides that split analytical work across specialist sub-agents such as financial, technical, security, or operations analysts
- Write precise oracle logic or verification scripts that validate specific analytical conclusions rather than generic summaries
- Create reproducible evaluation environments using Python and Docker
- Review task performance signals to ensure robust separation between weaker and stronger agentic systems
- Refine tasks to improve determinism, clarity, difficulty, and scoring quality Requirements:
- 5+ years of experience in data analysis
- Strong proficiency in SQL and Python for data analysis and scripting (pandas, NumPy, or similar)
- Experience working with real-world, messy datasets (CSV, JSON, logs, reports)
- Ability to design non-trivial analytical questions with clear, specific, and verifiable answers
- Solid understanding of statistical concepts (averages, distributions, outliers, correlations)
- Familiarity with AI coding benchmark environments (e.g., SWE-bench, Terminal-Bench)
- Comfortable working with Docker (writing Dockerfiles, building images, debugging containers) Perks of Freelancing With Turing:
- Work on cutting-edge AI projects with leading foundation model companies
- Collaborate on high-impact work at the frontier of LLM evaluation and reasoning
- Remote, flexible opportunities with global teams Offer Details:
- Commitments Required: 8 hours per day with a 4-hour overlap with PST.
- Employment Type: Contractor position (Note: this role does not include medical/paid leave).
- Duration of Contract: 4 weeks; [expected start date is next week]. Apply on Kit Job: kitjob.in/job/4nbuh1
-
Company nameTuring
-
Job positionAI Benchmark Engineer (Data Analysis) (Gurugram)
AI Benchmark Engineer (Data Analysis) (Gurugram) has been posted in the Gurgaon Engineering category on Locanto.
If you’re still wanting to browse, there is so much to explore in the Engineering category! Take a look at the ads Reliable IT Recruitment Company for Future-Ready Tech Hiring, Gurugram, Top IT Recruitment Services in Delhi NCR | HiringGo, Gurugram and End-to-End IT Recruitment Services | HiringGo IT Staffing in Unit No.538, 5th Floor, JMD Megapolis, Sohna Road, Gurugram-12201, Gurugram to discover more of what you’re looking for. In total, we have 3 ads in Engineering in Gurgaon on Locanto classifieds.
Interested in more? Widen your search to view ads in nearby areas of Gurgaon. This includes Engineering in Bādshāhpur, Dwarka and Mahipalpur. There are more ads within a 15 km radius for this category. If you want to view those ads, click here.