Jia Guo, Yifan (Sabrina) Zhao, and Dino Occhialini, LinkedIn
Join us for this session to learn more about how cost-to-serve was optimized by nearly 50% for Apache Pinot OLAP Database's production fleet of ~14K machines at LinkedIn.
The nature of OLAP workloads running in LinkedIn on Pinot have diverse characteristics in terms of:
- Varying workload demand (SLOs as low as P99 query latency < 100ms at 100K read QPS).
- Varying cost / resource usage (CPU, memory, IO) of SQL queries.
- Varying dataset sizes (clusters serving data from as low as 500GB to as high as 2PB).
The talk will go into details of the core cost optimization algorithm that considers varying factors to recommend an optimal SKU.
- Multiple SKU Profiles
- Low-overhead mechanisms to collect high cardinality profiling data from production clusters
- Resource constraints (CPU, Memory, Disk IOPS, Throughout etc)
The system has been built with the goal of supporting "Multiple SKUs" effectively -- both in terms of cost optimization and keeping operational overhead minimum (fully automated). Through our talk, we will go into the details of all the infrastructure pieces we have built to deliver the solution in a generic fashion.
We will further discuss how this has been integrated this into our day-to-day operational machinery.
https://www.usenix.org/conference/srecon25americas/presentation/guo