AI Benchmark Engineer - Knowledge / Research (Indore)
AI Benchmark Engineer - Knowledge / Research (Indore)
-
Indore, India
-
Posted: a week ago
-
Save
Description
Role Overview We are seeking a highly analytical and computationally proficient individual to join our team with a strong research background. You will be instrumental in contributing to this role by either crafting challenging and insightful problems in your respective research domain or devising elegant computational solutions. Responsibilities:
- Build multi-agent benchmark tasks that require reading, analysing, and synthesising large document collections
- Curate real-world research corpora — academic papers, case studies, technical reports — and design questions that require comprehensive analysis
- Write structured ground-truth oracles (JSON) with specific, verifiable answers that prove the agent actually read the source material
- Design LLM judge prompts that evaluate agent output field-by-field against the oracle
- Create decomposition guides that split research across multiple parallel sub-agents (one per document, one per domain, then synthesis) Offer Details:
- Duration: 12 months+
- Pay: INR 1.75 L
- 2.00 Lakhs per month (net/take-home)
- Number of positions: 12
- Mode of work: Fully Remote
- Experience: 5+ Years Required Qualifications:
- 5+ years of research experience — academic or industry research in any scientific domain.
- Strong reading comprehension and ability to extract structured information from unstructured text.
- Experience with JSON/data structures — designing schemas, validating output formats, Python scripting ability (for judge scripts and data processing).
- Experience with AI coding benchmarks (SWE-bench, Terminal-bench).
- Comfortable with Docker — writing Dockerfiles, building images, and debugging container issues.
- Attention to detail — building oracles requires exact values, not approximations Robust plus:
- Experience with systematic reviews, meta-analyses, or large-scale literature surveys.
- Familiarity with medical/legal/scientific document analysis.
- Experience with NLP or information extraction tasks.
- Knowledge of LLM evaluation and benchmarking (MMLU, GPQA, SimpleQA).
- Experience curating datasets for AI evaluation. Additional Details
- Commitments Required: 8 hours per day with a 4-hour overlap with PST.
- Employment Type: Contractor position (Note: this role does not include medical/paid leave). Example of what you'll produce: A task with 1500 medical case records (500 cardiac, 500 vascular, 500 systemic). The agent must read all cases, identify relevant ones, extract evidence, and produce a cross-domain diagnosis. The oracle requires exact first/last case IDs per file (proves the agent read start to end), verbatim excerpts from specific cases (proves it read individual records), and a cross-domain evidence matrix.
The decomposition uses 15 chunk-reader sub-agents, 3 domain synthesisers, and 1 final synthesiser. Oracle scores 1.0, single-agent scores 0.15, and multi-agent scores 0.80. Apply on Kit Job: kitjob.in/job/4lxdo3
- Build multi-agent benchmark tasks that require reading, analysing, and synthesising large document collections
- Curate real-world research corpora — academic papers, case studies, technical reports — and design questions that require comprehensive analysis
- Write structured ground-truth oracles (JSON) with specific, verifiable answers that prove the agent actually read the source material
- Design LLM judge prompts that evaluate agent output field-by-field against the oracle
- Create decomposition guides that split research across multiple parallel sub-agents (one per document, one per domain, then synthesis) Offer Details:
- Duration: 12 months+
- Pay: INR 1.75 L
- 2.00 Lakhs per month (net/take-home)
- Number of positions: 12
- Mode of work: Fully Remote
- Experience: 5+ Years Required Qualifications:
- 5+ years of research experience — academic or industry research in any scientific domain.
- Strong reading comprehension and ability to extract structured information from unstructured text.
- Experience with JSON/data structures — designing schemas, validating output formats, Python scripting ability (for judge scripts and data processing).
- Experience with AI coding benchmarks (SWE-bench, Terminal-bench).
- Comfortable with Docker — writing Dockerfiles, building images, and debugging container issues.
- Attention to detail — building oracles requires exact values, not approximations Robust plus:
- Experience with systematic reviews, meta-analyses, or large-scale literature surveys.
- Familiarity with medical/legal/scientific document analysis.
- Experience with NLP or information extraction tasks.
- Knowledge of LLM evaluation and benchmarking (MMLU, GPQA, SimpleQA).
- Experience curating datasets for AI evaluation. Additional Details
- Commitments Required: 8 hours per day with a 4-hour overlap with PST.
- Employment Type: Contractor position (Note: this role does not include medical/paid leave). Example of what you'll produce: A task with 1500 medical case records (500 cardiac, 500 vascular, 500 systemic). The agent must read all cases, identify relevant ones, extract evidence, and produce a cross-domain diagnosis. The oracle requires exact first/last case IDs per file (proves the agent read start to end), verbatim excerpts from specific cases (proves it read individual records), and a cross-domain evidence matrix.
The decomposition uses 15 chunk-reader sub-agents, 3 domain synthesisers, and 1 final synthesiser. Oracle scores 1.0, single-agent scores 0.15, and multi-agent scores 0.80. Apply on Kit Job: kitjob.in/job/4lxdo3
Highlights
-
Company nameMillionlogics
-
Job positionAI Benchmark Engineer - Knowledge / Research (Indore)
Safety Tips
Report any suspicious ads or messages.
More info about this ad
AI Benchmark Engineer - Knowledge / Research (Indore) has been posted in the Indore Engineering category on Locanto.
If you’re looking for something similar, check out Product Engineering Services at Arna Softech, Indore, Top Pharmaceutical Consultant in India for Engineering Project, Indore or Trusted Pharmaceutical Engineering Services in India in 501, 5th floor, Fortune Business Centre, RNT Marg, Indore, MP,, Indore, also posted in Engineering. Right now, there are 13 classified ads in Engineering in Indore on Locanto.
There are more ads within a 15 km radius for this category. If you want to view those ads, click here.