India

Distributed Training & Inference Optimization Engineer (New …, New Delhi

Distributed Training & Inference Optimization Engineer (New …, New Delhi
Description
Overview Join a highly advanced AI infrastructure team focused on building and optimizing large-scale machine learning systems. This environment leverages cutting-edge technologies to enable high-performance experimentation, scalable model deployment, and efficient processing of large datasets. The team operates globally, bringing together engineers and researchers to push the boundaries of deep learning, distributed systems, and next-generation compute platforms. About the Role This position is centered on maximizing the efficiency and scalability of GPU-based machine learning workloads, particularly for large language models (LLMs) and generative AI systems. You will work on improving both training performance and inference efficiency, ensuring optimal utilization of hardware resources, reduced latency, and faster model iteration cycles. The role requires hands-on expertise in deep learning frameworks, distributed systems, and performance optimization. Key Responsibilities
- Enhance performance of distributed training frameworks such as PyTorch, DeepSpeed, or similar systems
- Identify and resolve bottlenecks in large-scale training pipelines (e.g., memory usage, communication overhead, GPU utilization)
- Optimize inference systems using techniques like quantization, caching, and batching to achieve low latency and high throughput
- Collaborate with infrastructure and platform teams to improve resource orchestration, scheduling, and system reliability
- Design benchmarking tools and metrics to measure training efficiency, system throughput, and latency performance
- Apply advanced optimization techniques (e.g., mixture-of-experts, speculative decoding, model parallelism) to improve large model performance
- Continuously evaluate recent approaches to hardware acceleration and model execution efficiency Required Qualifications
- 3+ years of hands-on experience optimizing GPU-based machine learning workloads
- Strong expertise in deep learning frameworks such as PyTorch, DeepSpeed, or equivalent
- Experience with distributed training techniques for large-scale models
- Solid understanding of inference optimization strategies (e.g., quantization, pruning, caching, batching)
- Degree in Computer Science, Engineering, or a related technical field Preferred Qualifications
- Experience with CUDA programming and GPU performance profiling tools
- Familiarity with distributed systems communication libraries and optimization techniques
- Knowledge of model optimization methods such as FlashAttention, LoRA, or similar techniques
- Experience working with containerized or orchestrated environments for ML workloads
- Contributions to open-source machine learning or infrastructure projects
- Hands-on experience with modern inference serving frameworks Apply on Kit Job: kitjob.in/job/4ll5ke
Highlights
Safety Tips
Report any suspicious ads or messages.
1 / 10
More info about this ad

Distributed Training & Inference Optimization Engineer (New … has been posted in the New Delhi Transportation & Logistics category on Locanto.

If you’re still wanting to browse, there is so much to explore in the Transportation & Logistics category! Take a look at the ads Top Logistics Company in India, Gurugram, TCI Express – India’s Largest Express Logistics Company, Gurugram and TCI Express – India’s Best Express Logistics and Transportation in TCI House 69, Institutional Area, Gurugram to discover more of what you’re looking for. Right now, there are 8 classified ads in Transportation & Logistics in New Delhi on Locanto.

Interested in more? Widen your search to view ads in nearby areas of New Delhi. This includes Transportation & Logistics in Daryaganj, Delhi and Karol Bāgh. There are more ads within a 15 km radius for this category. If you want to view those ads, click here.