AI Benchmark Engineer (Knowledge/Research) (Kolkata)
-
Kolkata, India
-
Posted: less than a week ago
-
Save
- Build multi-agent benchmark tasks that require reading, analyzing, and synthesizing large document collections
- Curate real-world research corpora — academic papers, case studies, technical reports — and design questions that require comprehensive analysis
- Write structured ground-truth oracles (JSON) with specific, verifiable answers that prove the agent actually read the source material
- Design LLM judge prompts that evaluate agent output field-by-field against the oracle
- Create decomposition guides that split research across multiple parallel sub-agents (one per document, one per domain, then synthesis) Required Qualifications:
- 5+ years of research experience (academic or industry) in any scientific domain
- Strong reading comprehension with ability to extract structured data from unstructured text
- Experience with JSON and data structures, including schema design and output validation
- Proficiency in Python scripting for data processing and evaluation (e.g., judge scripts)
- Familiarity with AI coding benchmarks such as SWE-bench and Terminal-bench
- Hands-on experience with Docker (writing Dockerfiles, building images, debugging containers)
- High attention to detail, especially for creating precise evaluation oracles without approximations Nice to have:
- Experience with systematic reviews, meta-analyses, or large-scale literature surveys
- Familiarity with medical, legal, or scientific document analysis
- Experience with NLP or information extraction tasks
- Knowledge of LLM evaluation and benchmarking (e.g., MMLU, GPQA, SimpleQA)
- Experience curating datasets for AI evaluation Perks of Freelancing With Turing:
- Work in a fully remote environment.
- Opportunity to work on cutting-edge AI projects with leading LLM companies.
- Potential for contract extension based on performance and project needs. Offer Details:
- Commitments Required: 40 hours /week with 4 hours of PST Overlap
- Engagement type: Contractor assignment/freelancer (no medical/paid leave)
- Duration of contract: 1 month; [expected start date is next week] Apply on Kit Job: kitjob.in/job/4mevae
-
Company nameTuring
-
Job positionAI Benchmark Engineer (Knowledge/Research) (Kolkata)
AI Benchmark Engineer (Knowledge/Research) (Kolkata) has been posted in the Kolkata Engineering category on Locanto.
If you’re still wanting to browse, there is so much to explore in the Engineering category! Take a look at the ads Urgent Requirement”, Uttarpāra, skill development course, Kolkata and JIS College of Engineering is widely recognized as the Best Priv in Block A, Phase III, Kalyani, Nadia - 741235, West Bengal, Kolkata West Bengal India to discover more of what you’re looking for. In total, we have 6 ads in Engineering in Kolkata on Locanto classifieds.
You can find the Engineering category under Jobs. Want something else? Check out the related categories Part Time Jobs & Side Jobs, Hospitality, Tourism & Travel and Marketing, Advertising & PR Kolkata.
Interested in more? Widen your search to view ads in nearby areas of Kolkata. This includes Engineering in Sibpur, South Dumdum and Baranagar. There are more ads within a 15 km radius for this category. If you want to view those ads, click here.