Terminal Bench Expert (Baddi)
Terminal Bench Expert (Baddi)
-
Baddi, India
-
Posted: yesterday
-
Save
Description
Terminal Bench Expert Employment Type Contractor assignment (no medical/paid leave)
Skills
- 3-10 years of experience in software engineering or relevant domains.
- Strong debugging, reasoning, and analytical skills
About the Role: Looking for highly analytical engineers, researchers, and domain specialists to contribute benchmark tasks for AI agent evaluation systems (e.g., Terminal-Bench). Design realistic, technically deep tasks simulating real-world scenarios such as debugging, data corruption, infrastructure failures, and complex workflows. What does day-to-day look like:
- Design high-quality Terminal-Bench task ideas and specifications.
- Develop complex tasks requiring reasoning, investigation, and debugging.
- Write transparent task descriptions, solution approaches, and verification logic.
- Define deterministic, outcome-based evaluation criteria.
- Identify realistic failure modes, edge cases, and operational constraints.
- Create tasks that challenge AI systems while remaining solvable by experts.
- Collaborate with reviewers to refine task quality and difficulty.
- Contribute expertise across one or more specialized domains.
Required Skills:
- 3–10 years of experience in software engineering or relevant domains.
- Strong debugging, reasoning, and analytical skills.
- Good understanding of system design, workflows, and dependencies.
- Ability to analyze complex systems across multiple layers.
- Experience with production systems, pipelines, or large-scale workflows.
- Strong technical writing and documentation skills.
- Exposure to LLMs, agentic systems, or AI evaluation frameworks.
- Experience reviewing technical specifications or designing validation logic.
Domains (Any of the following):
- Software Engineering & Code Operations
- Debugging & Codebase Navigation
- System Administration & Shell Workflows
- File & Text Processing Pipelines
- Data Engineering (ETL & Data Pipelines)
- Database & SQL Operations
- Machine Learning Pipelines & MLOps
- Post-tra Apply on Kit Job: kitjob.in/job/4nbasr
Skills
- 3-10 years of experience in software engineering or relevant domains.
- Strong debugging, reasoning, and analytical skills
About the Role: Looking for highly analytical engineers, researchers, and domain specialists to contribute benchmark tasks for AI agent evaluation systems (e.g., Terminal-Bench). Design realistic, technically deep tasks simulating real-world scenarios such as debugging, data corruption, infrastructure failures, and complex workflows. What does day-to-day look like:
- Design high-quality Terminal-Bench task ideas and specifications.
- Develop complex tasks requiring reasoning, investigation, and debugging.
- Write transparent task descriptions, solution approaches, and verification logic.
- Define deterministic, outcome-based evaluation criteria.
- Identify realistic failure modes, edge cases, and operational constraints.
- Create tasks that challenge AI systems while remaining solvable by experts.
- Collaborate with reviewers to refine task quality and difficulty.
- Contribute expertise across one or more specialized domains.
Required Skills:
- 3–10 years of experience in software engineering or relevant domains.
- Strong debugging, reasoning, and analytical skills.
- Good understanding of system design, workflows, and dependencies.
- Ability to analyze complex systems across multiple layers.
- Experience with production systems, pipelines, or large-scale workflows.
- Strong technical writing and documentation skills.
- Exposure to LLMs, agentic systems, or AI evaluation frameworks.
- Experience reviewing technical specifications or designing validation logic.
Domains (Any of the following):
- Software Engineering & Code Operations
- Debugging & Codebase Navigation
- System Administration & Shell Workflows
- File & Text Processing Pipelines
- Data Engineering (ETL & Data Pipelines)
- Database & SQL Operations
- Machine Learning Pipelines & MLOps
- Post-tra Apply on Kit Job: kitjob.in/job/4nbasr
Highlights
-
Company nameCodefeast
-
Job positionTerminal Bench Expert (Baddi)
Safety Tips
Be careful with jobs that explicitly state ’no experience needed’.
More info about this ad
Terminal Bench Expert (Baddi) has been posted in the Baddi Other Jobs category on Locanto.
For Baddi, there are no other ads posted in this category.
There are more ads within a 15 km radius for this category. If you want to view those ads, click here.