SUMM

Profiling is a long-standing technique in computer science, originating in the 1970s, used to analyze system performance.
Profiling helps track memory, CPU, and GPU time spent, usage of specific instructions, and function call frequencies.
It is essential to improve software performance and reduce costs; a 10% performance gain can theoretically allow turning off 10% of servers.

Tracing profiling records every event, offering a detailed view but with high storage and performance costs.
Sampled profiling selects events at regular intervals (like 20 or 100 times per second), resulting in lower overhead (less than 1% CPU, about 4MB memory).
Always-on sampling in production ensures observation of real-world scenarios, which is more meaningful than sporadic or development-only profiling.
Sampling with low overhead can be performed using Linux eBPF, requiring no changes to applications.

The company leverages eBPF for profiling, making it seamless and requiring no application instrumentation.
Matthias Loibl, the presenter, is the director at Polar Signals Cloud and a maintainer of various open source projects including Prometheus.
Previously focused on CPU and memory profiling, Polar Signals began previewing GPU profiling earlier in the year.

Profiling integrates with Nvidia NVML to extract GPU metrics such as overall node utilization, per-process utilization, memory use, and clock speed.
Monitoring these metrics helps identify drops in GPU utilization, informing where to optimize for better resource usage.
Additional metrics include power utilization, power limit, temperature (which affects GPU throttling), and PCIe data throughput to monitor data transfer between CPU and GPU.

Metrics can be correlated, linking GPU data with CPU profiling stacks collected via eBPF.
The system enables investigating specific moments in CPU activity, correlated with GPU utilization, using visualizations like flame charts.
Flame charts help visualize what the CPU is executing while GPU utilization is low, revealing potential CPU bottlenecks (e.g., Python or Rust code not feeding data fast enough to the GPU).
The profiling approach supports compiled and interpreted languages, making it broadly applicable across different application domains.

GPU time profiling is a newly introduced feature that tracks time spent by specific functions on the GPU, not just the CPU.
The profiler monitors when CUDA-related functions start and end, providing precise GPU execution durations per function.
Real-world examples show how function time on both CPU and GPU is visualized, with stack visualizations showing the relationship between CPU calls and their corresponding GPU kernel execution.
Color coding in stack traces represents different binaries running on the system.

The profiling system runs on Linux with eBPF support and can be deployed as a binary or as a Kubernetes daemonset.
Deployment involves straightforward setup, requiring just a manifest and an authentication token.
Existing customers have used it for CPU/memory profiling and are beginning to integrate GPU profiling, with specific interest from domains like vector databases.
Polar Signals offers free consultations and discounts for early-stage startups interested in adopting their profiling solutions.

Continuous Profiling for GPUs — Matthias Loibl, Polar Signals