A leading AI business is hiring an AI Infrastructure Engineer who has experience with GPU, Distributed Systems & AI Platforms. Hybrid/Remote options available. Outside IR35. Paying between £800 to £1000 per day.
Experience and skills required for the AI Infrastructure Engineer, GPU, Distributed Systems & AI Platforms
Strong systems-level engineering experience, ideally in infrastructure, HPC, platform engineering or AI/ML environments
Hands-on experience operating large-scale compute or GPU-backed infrastructure
Experience with distributed systems and multi-node environments
Familiarity with NCCL and GPU-to-GPU communication
Experience with Kubernetes, containerised platforms and cluster orchestration
Strong coding ability in Python, Go or C++
Experience working with high-performance storage across complex environments is highly desirable
A strong troubleshooting mindset with the ability to understand behaviour at cluster, hardware and network level
Nice to have
Exposure to InfiniBand, bare-metal provisioning or HPC-style networking
Experience supporting training or inference environments for large-scale ML models
Background in AI infrastructure start-ups, hyperscalers or high-performance compute environments
Experience with profiling / benchmarking tools and performance optimisation at scale
Role and responsibilities for the AI Infrastructure Engineer, GPU, Distributed Systems & AI Platforms
Build, operate and optimise large-scale GPU infrastructure for AI training and inference
Support multi-node, multi-GPU environments and distributed workloads
Improve cluster health, fault tolerance and remediation workflows across GPU fleets
Optimise GPU-to-GPU communication, workload performance and infrastructure utilisation
Work with high-performance storage systems supporting large datasets and checkpointing
Build or improve tooling for profiling, monitoring, benchmarking and performance analysis
Collaborate closely with ML researchers, platform teams and infrastructure engineers to remove bottlenecks and improve training efficiency
Support capacity planning and deployment for next-generation compute environments
Package for the AI Infrastructure Engineer, GPU, Distributed Systems & AI Platforms
Outside IR35
Hybrid position
Paying up to £1000 per day
About Outside Spy
Outside Spy discovers all the Outside IR35 IT contract opportunities for members.