AI Benchmark Engineer - Knowledge / Research (Indore)
AI Benchmark Engineer - Knowledge / Research (Indore)
-
Indore, India
-
Posted: a week ago
-
Save
Description
Role Overview We are seeking a highly analytical and computationally proficient individual to join our team with a solid research background. You will be instrumental in contributing to this role by either crafting challenging and insightful problems in your respective research domain or devising elegant computational solutions. Responsibilities:
- Build multi-agent benchmark tasks that require reading, analysing, and synthesising large document collections
- Curate real-world research corpora — academic papers, case studies, technical reports — and design questions that require comprehensive analysis
- Write structured ground-truth oracles (JSON) with specific, verifiable answers that prove the agent actually read the source material
- Design LLM judge prompts that evaluate agent output field-by-field against the oracle
- Create decomposition guides that split research across multiple parallel sub-agents (one per document, one per domain, then synthesis) Offer Details:
- Duration: 12 months+
- Pay: INR 1.75 L
- 2.00 Lakhs per month (net/take-home)
- Number of positions: 12
- Mode of work: Fully Remote
- Experience: 5+ Years Required Qualifications:
- 5+ years of research experience — academic or industry research in any scientific domain.
- Strong reading comprehension and ability to extract structured information from unstructured text.
- Experience with JSON/data structures — designing schemas, validating output formats, Python scripting ability (for judge scripts and data processing).
- Experience with AI coding benchmarks (SWE-bench, Terminal-bench).
- Comfortable with Docker — writing Dockerfiles, building images, and debugging container issues.
- Attention to detail — building oracles requires exact values, not approximations Strong plus:
- Experience with systematic reviews, meta-analyses, or large-scale literature surveys.
- Familiarity with medical/legal/scientific document analysis.
- Experience with NLP or information extraction tasks.
- Knowledge of LLM evaluation and benchmarking (MMLU, GPQA, SimpleQA).
- Experience curating datasets for AI evaluation. Additional Details
- Commitments Required: 8 hours per day with a 4-hour overlap with PST.
- Employment Type: Contractor position (Note: this role does not include medical/paid leave). Example of what you'll produce: A task with 1500 medical case records (500 cardiac, 500 vascular, 500 systemic). The agent must read all cases, identify relevant ones, extract evidence, and produce a cross-domain diagnosis. The oracle requires exact first/last case IDs per file (proves the agent read start to end), verbatim excerpts from specific cases (proves it read individual records), and a cross-domain evidence matrix. The decomposition uses 15 chunk-reader sub-agents, 3 domain synthesisers, and 1 final synthesiser. Oracle scores 1.0, single-agent scores 0.15, and multi-agent scores 0.80. Apply on Kit Job: kitjob.in/job/4lxfk6
- Build multi-agent benchmark tasks that require reading, analysing, and synthesising large document collections
- Curate real-world research corpora — academic papers, case studies, technical reports — and design questions that require comprehensive analysis
- Write structured ground-truth oracles (JSON) with specific, verifiable answers that prove the agent actually read the source material
- Design LLM judge prompts that evaluate agent output field-by-field against the oracle
- Create decomposition guides that split research across multiple parallel sub-agents (one per document, one per domain, then synthesis) Offer Details:
- Duration: 12 months+
- Pay: INR 1.75 L
- 2.00 Lakhs per month (net/take-home)
- Number of positions: 12
- Mode of work: Fully Remote
- Experience: 5+ Years Required Qualifications:
- 5+ years of research experience — academic or industry research in any scientific domain.
- Strong reading comprehension and ability to extract structured information from unstructured text.
- Experience with JSON/data structures — designing schemas, validating output formats, Python scripting ability (for judge scripts and data processing).
- Experience with AI coding benchmarks (SWE-bench, Terminal-bench).
- Comfortable with Docker — writing Dockerfiles, building images, and debugging container issues.
- Attention to detail — building oracles requires exact values, not approximations Strong plus:
- Experience with systematic reviews, meta-analyses, or large-scale literature surveys.
- Familiarity with medical/legal/scientific document analysis.
- Experience with NLP or information extraction tasks.
- Knowledge of LLM evaluation and benchmarking (MMLU, GPQA, SimpleQA).
- Experience curating datasets for AI evaluation. Additional Details
- Commitments Required: 8 hours per day with a 4-hour overlap with PST.
- Employment Type: Contractor position (Note: this role does not include medical/paid leave). Example of what you'll produce: A task with 1500 medical case records (500 cardiac, 500 vascular, 500 systemic). The agent must read all cases, identify relevant ones, extract evidence, and produce a cross-domain diagnosis. The oracle requires exact first/last case IDs per file (proves the agent read start to end), verbatim excerpts from specific cases (proves it read individual records), and a cross-domain evidence matrix. The decomposition uses 15 chunk-reader sub-agents, 3 domain synthesisers, and 1 final synthesiser. Oracle scores 1.0, single-agent scores 0.15, and multi-agent scores 0.80. Apply on Kit Job: kitjob.in/job/4lxfk6
Highlights
-
Company nameMillionlogics
-
Job positionAI Benchmark Engineer - Knowledge / Research (Indore)
Safety Tips
Be careful with commission-based ’work-from-home’ positions that offer an unrealistically high income.
More info about this ad
AI Benchmark Engineer - Knowledge / Research (Indore) has been posted in the Indore Engineering category on Locanto.
If you’re still wanting to browse, there is so much to explore in the Engineering category! Take a look at the ads Product Engineering Services at Arna Softech, Indore, pharmaceutical consultant in India, Indore and Pharmaceutical Consultant in India | Pharmaceutical Engineering in 501, 5th floor, Fortune Business Centre, RNT Marg, Indore, MP,, Indore to discover more of what you’re looking for. Currently, there are 13 ads posted in the Engineering category in Indore.
There are more ads within a 15 km radius for this category. If you want to view those ads, click here.