Back

Terminal Bench Expert (Vapi)

20.3735 72.9084
Vapi, India
Posted: less than a week ago
Save
Share

Description

Terminal Bench Expert Employment Type Contractor assignment (no medical/paid leave) Skills
- 3-10 years of experience in software engineering or relevant domains.
- Strong debugging, reasoning, and analytical skills About the Role: Looking for highly analytical engineers, researchers, and domain specialists to contribute benchmark tasks for AI agent evaluation systems (e.g., Terminal-Bench). Design realistic, technically deep tasks simulating real-world scenarios such as debugging, data corruption, infrastructure failures, and complex workflows. What does day-to-day look like:
- Design high-quality Terminal-Bench task ideas and specifications.
- Develop complex tasks requiring reasoning, investigation, and debugging.
- Write clear task descriptions, solution approaches, and verification logic.
- Define deterministic, outcome-based evaluation criteria.
- Identify realistic failure modes, edge cases, and operational constraints.
- Create tasks that challenge AI systems while remaining solvable by experts.
- Collaborate with reviewers to refine task quality and difficulty.
- Contribute expertise across one or more specialized domains. Required Skills:
- 3–10 years of experience in software engineering or relevant domains.
- Strong debugging, reasoning, and analytical skills.
- Good understanding of system design, workflows, and dependencies.
- Ability to analyze complex systems across multiple layers.
- Experience with production systems, pipelines, or large-scale workflows.
- Solid technical writing and documentation skills.
- Exposure to LLMs, agentic systems, or AI evaluation frameworks.
- Experience reviewing technical specifications or designing validation logic. Domains (Any of the following):
- Software Engineering & Code Operations
- Debugging & Codebase Navigation
- System Administration & Shell Workflows
- File & Text Processing Pipelines
- Data Engineering (ETL & Data Pipelines)
- Database & SQL Operations
- Machine Learning Pipelines & MLOps
- Post-training & Model Finetuning Workflows
- AI Evaluation & Benchmarking Systems
- Retrieval, Search & Ranking Systems
- GPU / Systems Performance Optimization
- Distributed Systems & Infrastructure
- Cloud & Platform Engineering
- DevOps & CI/CD Systems
- Build & Dependency Management
- Scientific & Numerical Computing
- Simulation & Optimization Systems
- Formal Methods & Theorem Proving
- Document & Structured Data Processing (PDFs, Excel, etc.)
- Media Processing (Video, Audio, Images via CLI tools)
- Programmatic Graphics & Design (SVG, layout, rendering)
- Data Visualization & Reporting Workflows
- Geospatial & Spatial Data Processing
- Time-series & Forecasting Systems
- Security, Forensics & Reverse Engineering
- Cybersecurity & Vulnerability Analysis
- Networking & API Integration Workflows
- Automation & Multi-step Toolchain Orchestration
- CLI Tooling & Developer Tool Workflows
- Version Control & Git Workflows
- Observability, Logging & Monitoring
- Storage Systems & File Systems
- Finance & Accounting Workflows
- Quantitative Finance & Risk Modeling
- Legal & Compliance Workflows
- Healthcare & Clinical Data Processing
- Supply Chain & Logistics Operations
- Marketing & Growth Analytics
- CRM & Sales Operations
- HR & Recruiting Analytics
- Consulting & Strategy Modeling
- Investment Workflows
- Operations Research & Decision Optimization
- Benchmark Infrastructure, Adapters & Harness Evaluation Process (approximately 45 mins) :
- One round of technical evaluation (45 mins) Apply on Kit Job: kitjob.in/job/4mo126

Highlights

Company name

Codefeast
Job position

Terminal Bench Expert (Vapi)

Ad ID:

8795482135
Flag
Block ad

Safety Tips

Be careful if you are offered a job on the spot.