Distributed Training & Inference Optimization Engineer (Narela)
Distributed Training & Inference Optimization Engineer (Narela)
-
Narela, India
-
Posted: a week ago
-
Save
Description
OverviewJoin a highly advanced AI infrastructure team focused on building and optimizing large-scale machine learning systems. This environment leverages cutting-edge technologies to enable high-performance experimentation, scalable model deployment, and productive processing of large datasets.The team operates globally, bringing together engineers and researchers to push the boundaries of deep learning, distributed systems, and next-generation compute platforms.About the RoleThis position is centered on maximizing the efficiency and scalability of GPU-based machine learning workloads , particularly for large language models (LLMs) and generative AI systems.You will work on improving both training performance and inference efficiency , ensuring optimal utilization of hardware resources, reduced latency, and faster model iteration cycles. The role requires hands-on expertise in deep learning frameworks, distributed systems, and performance optimization.Key ResponsibilitiesEnhance performance of distributed training frameworks such as PyTorch, DeepSpeed, or similar systemsIdentify and resolve bottlenecks in large-scale training pipelines (e.g., memory usage, communication overhead, GPU utilization)Optimize inference systems using techniques like quantization, caching, and batching to achieve low latency and high throughputCollaborate with infrastructure and platform teams to improve resource orchestration, scheduling, and system reliabilityDesign benchmarking tools and metrics to measure training efficiency, system throughput, and latency performanceApply advanced optimization techniques (e.g., mixture-of-experts, speculative decoding, model parallelism) to improve large model performanceContinuously evaluate new approaches to hardware acceleration and model execution efficiencyRequired Qualifications3+ years of hands-on experience optimizing GPU-based machine learning workloadsStrong expertise in deep learning frameworks such as PyTorch, DeepSpeed, or equivalentExperience with distributed training techniques for large-scale modelsSolid understanding of inference optimization strategies (e.g., quantization, pruning, caching, batching)Degree in Computer Science, Engineering, or a related technical fieldPreferred QualificationsExperience with CUDA programming and GPU performance profiling toolsFamiliarity with distributed systems communication libraries and optimization techniquesKnowledge of model optimization methods such as FlashAttention, LoRA, or similar techniquesExperience working with containerized or orchestrated environments for ML workloadsContributions to open-source machine learning or infrastructure projectsHands-on experience with modern inference serving frameworks Apply on Kit Job: kitjob.in/job/4m5ds3
Highlights
-
Company nameGoogle
-
Job positionDistributed Training & Inference Optimization Engineer (Narela)
Safety Tips
Be careful: if it seems too good to be true, it most likely is.
More info about this ad
Distributed Training & Inference Optimization Engineer (Narela) has been posted in the Kirari Suleman Nagar Transportation & Logistics category on Locanto.
If you’re wanting to discover more, check out the ad We Are Hiring! – Dispatch Executive | Sonipat Haryana in Sonīpat in this category.
Interested in more? Widen your search to view ads in nearby areas of Kirari Suleman Nagar. This includes Transportation & Logistics in Jawaharnagar, Bahādurgarh and Janakpuri. There are more ads within a 15 km radius for this category. If you want to view those ads, click here.