AI Benchmark Engineer (Knowledge/Research) (Kolkata)
-
Kolkata, India
-
Posted: yesterday
-
Save
- Build multi-agent benchmark tasks that require reading, analyzing, and synthesizing large document collections
- Curate real-world research corpora — academic papers, case studies, technical reports — and design questions that require comprehensive analysis
- Write structured ground-truth oracles (JSON) with specific, verifiable answers that prove the agent actually read the source material
- Design LLM judge prompts that evaluate agent output field-by-field against the oracle
- Create decomposition guides that split research across multiple parallel sub-agents (one per document, one per domain, then synthesis) Required Qualifications:
- 5+ years of research experience (academic or industry) in any scientific domain
- Strong reading comprehension with ability to extract structured data from unstructured text
- Experience with JSON and data structures, including schema design and output validation
- Proficiency in Python scripting for data processing and evaluation (e.g., judge scripts)
- Familiarity with AI coding benchmarks such as SWE-bench and Terminal-bench
- Hands-on experience with Docker (writing Dockerfiles, building images, debugging containers)
- High attention to detail, especially for creating precise evaluation oracles without approximations Nice to have:
- Experience with systematic reviews, meta-analyses, or large-scale literature surveys
- Familiarity with medical, legal, or scientific document analysis
- Experience with NLP or information extraction tasks
- Knowledge of LLM evaluation and benchmarking (e.g., MMLU, GPQA, SimpleQA)
- Experience curating datasets for AI evaluation Perks of Freelancing With Turing:
- Work in a fully remote environment.
- Prospect to work on cutting-edge AI projects with leading LLM companies.
- Potential for contract extension based on performance and project needs. Offer Details:
- Commitments Required: 40 hours /week with 4 hours of PST Overlap
- Engagement type: Contractor assignment/freelancer (no medical/paid leave)
- Duration of contract: 1 month; (expected start date is next week) Apply on Kit Job: kitjob.in/job/4n9exz
-
Company nameTuring
-
Job positionAI Benchmark Engineer (Knowledge/Research) (Kolkata)
AI Benchmark Engineer (Knowledge/Research) (Kolkata) has been posted in the Kolkata Engineering category on Locanto.
If you’re looking for something similar, check out skill development course, Kolkata, JIS College of Engineering is widely recognized as the Best Priv, Kolkata West Bengal India or Top Engineering Colleges in Kolkata GNIT in 157/F, Nilgunj Rd, Sahid Colony, Panihati, Khardah, Khardaha, also posted in Engineering. Currently, there are 6 ads posted in the Engineering category in Kolkata.
You can find the Engineering category under Jobs. Want something else? Check out the related categories Hospitality, Tourism & Travel, Arts & Culture and Accounting, Financing & Banking Kolkata.
Interested in more? Widen your search to view ads in nearby areas of Kolkata. This includes Engineering in Salt Lake City, Baranagar and Sibpur. There are more ads within a 15 km radius for this category. If you want to view those ads, click here.