Name: Fully Automated HW SKU Selection System to Optimize Apache Pinot’s Cost-to-Serve at LinkedIn
Start: 2025-03-27T09:00:00-0700
End: 2025-03-27T09:45:00-0700

Thursday March 27, 2025 9:00am - 9:45am PDT

Grand Ballroom GH

Jia Guo, Yifan (Sabrina) Zhao, and Dino Occhialini, LinkedIn

Join us for this session to learn more about how cost-to-serve was optimized by nearly 50% for Apache Pinot OLAP Database's production fleet of ~14K machines at LinkedIn.

The nature of OLAP workloads running in LinkedIn on Pinot have diverse characteristics in terms of:

Varying workload demand (SLOs as low as P99 query latency < 100ms at 100K read QPS).

Varying cost / resource usage (CPU, memory, IO) of SQL queries.

Varying dataset sizes (clusters serving data from as low as 500GB to as high as 2PB).

The talk will go into details of the core cost optimization algorithm that considers varying factors to recommend an optimal SKU.

Multiple SKU Profiles

Low-overhead mechanisms to collect high cardinality profiling data from production clusters

Resource constraints (CPU, Memory, Disk IOPS, Throughout etc)

The system has been built with the goal of supporting "Multiple SKUs" effectively -- both in terms of cost optimization and keeping operational overhead minimum (fully automated). Through our talk, we will go into the details of all the infrastructure pieces we have built to deliver the solution in a generic fashion.

We will further discuss how this has been integrated this into our day-to-day operational machinery.

https://www.usenix.org/conference/srecon25americas/presentation/guo

Speakers

Jia Guo

Jia is a Senior Software Engineer at LinkedIn, a committer for Apache Pinot. Jia focuses on making Pinot Fault-Tolerant and cost-effective. He has contributed across different areas of Pinot ranging from OLAP engine, indexing, fault tolerant shard placement to several performance... Read More →

Yifan (Sabrina) Zhao

Sabrina is a Software Engineer at LinkedIn and a contributor for Apache Pinot. Sabrina has contributed features like SQL Pagination, availability improvements for massive multi-tenant clusters, OLAP SQL enhancements and fault-tolerant shard placement.

Dino Occhialini

Dino is a Staff Software Engineer at LinkedIn and a contributor to Apache Pinot. Dino has been a strong SRE Leader for the Pinot team at LinkedIn. Dino has made many noteworthy contributions towards improving Pinot's operational excellence, resiliency, Site-Up, provisioning and usability... Read More →

Thursday March 27, 2025 9:00am - 9:45am PDT
Grand Ballroom GH

Track 2

SREcon25 Americas

Jia Guo

Yifan (Sabrina) Zhao

Dino Occhialini

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!