AI Infrastructure Engineer (GPU, Distributed Systems, AI Platforms) at Emporia Consulting Group Limited, City of London, £Contract Rate

Contract Description

A leading AI business is hiring an AI Infrastructure Engineer who has experience with GPU, Distributed Systems & AI Platforms. Hybrid/Remote options available. Outside IR35. Paying between £800 to £1000 per day.

Experience and skills required for the AI Infrastructure Engineer, GPU, Distributed Systems & AI Platforms
  • Strong systems-level engineering experience, ideally in infrastructure, HPC, platform engineering or AI/ML environments
  • Hands-on experience operating large-scale compute or GPU-backed infrastructure
  • Experience with distributed systems and multi-node environments
  • Familiarity with NCCL and GPU-to-GPU communication
  • Experience with Kubernetes, containerised platforms and cluster orchestration
  • Strong coding ability in Python, Go or C++
  • Experience working with high-performance storage across complex environments is highly desirable
  • A strong troubleshooting mindset with the ability to understand behaviour at cluster, hardware and network level
Nice to have
  • Exposure to InfiniBand, bare-metal provisioning or HPC-style networking
  • Experience supporting training or inference environments for large-scale ML models
  • Background in AI infrastructure start-ups, hyperscalers or high-performance compute environments
  • Experience with profiling / benchmarking tools and performance optimisation at scale
Role and responsibilities for the AI Infrastructure Engineer, GPU, Distributed Systems & AI Platforms
  • Build, operate and optimise large-scale GPU infrastructure for AI training and inference
  • Support multi-node, multi-GPU environments and distributed workloads
  • Improve cluster health, fault tolerance and remediation workflows across GPU fleets
  • Optimise GPU-to-GPU communication, workload performance and infrastructure utilisation
  • Work with high-performance storage systems supporting large datasets and checkpointing
  • Build or improve tooling for profiling, monitoring, benchmarking and performance analysis
  • Collaborate closely with ML researchers, platform teams and infrastructure engineers to remove bottlenecks and improve training efficiency
  • Support capacity planning and deployment for next-generation compute environments
Package for the AI Infrastructure Engineer, GPU, Distributed Systems & AI Platforms
  • Outside IR35
  • Hybrid position
  • Paying up to £1000 per day