Back

AI Benchmark Engineer – Mathematical Reasoning (Varanasi)

25.3356 83.0076
Varanasi, India
Posted: a week ago
Save
Share

Description

Role Overview We are seeking a highly analytical and computationally proficient individual to join our team with a strong research background. You will be instrumental in contributing to this role by either crafting challenging and insightful problems in your respective research domain or devising elegant computational solutions. Responsibilities:
- Build multi-agent benchmark tasks that require multi-step mathematical reasoning, proof construction, or algorithmic problem-solving
- Design problems that are genuinely hard for a single agent but decomposable — competition math, numerical analysis, combinatorial optimisation, statistical inference
- Create verification scripts that check mathematical correctness — numerical answers with appropriate tolerance, proof step validity, and algorithm output correctness
- Write clear problem statements with precise notation, definitions, and output format
- Create decomposition guides that split problems into independent sub-computations or parallel solution strategies Offer Details
- Pay: INR 1.75 to 2 Lakhs per month
- Mode of work: Fully Remote
- Duration: 12 months (likely extended)
- Number of positions: 15 Required Qualifications:
- 5+ years in mathematics, quantitative research, or computational science — competition math, university-level mathematics, or quantitative research background. Python programming — NumPy, SciPy, or symbolic computation (SymPy). Experience writing mathematical proofs or formal derivations.
- Ability to create problems with accurate, verifiable answers — not subjective or open-ended.
- Experience with AI coding benchmarks (SWE-bench, Terminal-bench). Comfortable with Docker — writing Dockerfiles, building images, and debugging container issues.
- Understanding of numerical methods — floating-point tolerance, convergence criteria, and error bounds. Strong plus:
- Experience creating math competition problems (AMC, AIME, Putnam, IMO, or similar).
- Research in mathematics, theoretical CS, or quantitative fields.
- Experience with automated theorem proving or formal verification.
- Knowledge of AI reasoning benchmarks (GSM8K, MATH, AIME, GPQA, ARC-AGI).
- Experience with large-scale numerical computation or scientific computing Example of what you will produce: A task requiring analysis of a system of 50 coupled differential equations modelling a chemical reaction network. The agent must determine equilibrium concentrations, stability conditions, and bifurcation points. Input includes the reaction network as a matrix, rate constants, and initial conditions. The verifier checks numerical answers with tolerance (1e-6), validates eigenvalue analysis for stability, and confirms bifurcation parameter ranges. The decomposition splits into 4 sub-agents: one computes equilibria, one analyses local stability, one maps bifurcations, and one synthesises the phase portrait. Oracle scores 1.0, single-agent scores 0.25, multi-agent scores 0.80. Offer Details
- Commitments Required: 8 hours per day with a 4-hour overlap with PST.
- Employment Type: Contractor position (Note: this role does not include medical/paid leave). Apply on Kit Job: kitjob.in/job/4lada9

Highlights

Company name

Millionlogics
Job position

AI Benchmark Engineer – Mathematical Reasoning (Varanasi)

Ad ID:

8764496062
Flag
Block ad

Safety Tips

Beware of ads written with poor grammar or spelling.