AI Benchmark Engineer (Knowledge/Research) (Lucknow)
-
Lucknow, India
-
Posted: yesterday
-
Save
- Build multi-agent benchmark tasks that require reading, analyzing, and synthesizing large document collections
- Curate real-world research corpora — academic papers, case studies, technical reports — and design questions that require comprehensive analysis
- Write structured ground-truth oracles (JSON) with specific, verifiable answers that prove the agent actually read the source material
- Design LLM judge prompts that evaluate agent output field-by-field against the oracle
- Create decomposition guides that split research across multiple parallel sub-agents (one per document, one per domain, then synthesis) Required Qualifications:
- 5+ years of research experience (academic or industry) in any scientific domain
- Strong reading comprehension with ability to extract structured data from unstructured text
- Experience with JSON and data structures, including schema design and output validation
- Proficiency in Python scripting for data processing and evaluation (e.g., judge scripts)
- Familiarity with AI coding benchmarks such as SWE-bench and Terminal-bench
- Hands-on experience with Docker (writing Dockerfiles, building images, debugging containers)
- High attention to detail, especially for creating precise evaluation oracles without approximations Nice to have:
- Experience with systematic reviews, meta-analyses, or large-scale literature surveys
- Familiarity with medical, legal, or scientific document analysis
- Experience with NLP or information extraction tasks
- Knowledge of LLM evaluation and benchmarking (e.g., MMLU, GPQA, SimpleQA)
- Experience curating datasets for AI evaluation Perks of Freelancing With Turing:
- Work in a fully remote environment.
- Prospect to work on cutting-edge AI projects with leading LLM companies.
- Potential for contract extension based on performance and project needs. Offer Details:
- Commitments Required: 40 hours /week with 4 hours of PST Overlap
- Engagement type: Contractor assignment/freelancer (no medical/paid leave)
- Duration of contract: 1 month; [expected start date is next week] Apply on Kit Job: kitjob.in/job/4nb5sr
-
Company nameTuring
-
Job positionAI Benchmark Engineer (Knowledge/Research) (Lucknow)
AI Benchmark Engineer (Knowledge/Research) (Lucknow) has been posted in the Lucknow Engineering category on Locanto.
If you’re looking for something similar, check out Diploma in Civil Engineering in Lucknow – Shape the World Around, Lucknow, Diploma in Civil Engineering in Lucknow – Visit AGI, Lucknow or Best Diploma in Electrical Engineering in Lucknow – AGI in Lucknow, also posted in Engineering. Right now, there are 7 classified ads in Engineering in Lucknow on Locanto.
You can find the Engineering category under Jobs. Want something else? Check out the related categories Information Technology, Part Time Jobs & Side Jobs and Marketing, Advertising & PR Lucknow.
Interested in more? Widen your search to view ads in nearby areas of Lucknow. This includes Engineering in Chinhat, Hasanganj and Charbagh. There are more ads within a 15 km radius for this category. If you want to view those ads, click here.