Loading…
Type: Plenary clear filter
Tuesday, March 25
 

8:45am PDT

Opening Remarks
Tuesday March 25, 2025 8:45am - 9:00am PDT
Program Co-Chairs: Dan Fainstein, The D. E. Shaw Group; Laura Maguire, Trace Cognitive Engineering
Speakers
avatar for Dan Fainstein

Dan Fainstein

The D. E. Shaw Group
avatar for Laura Maguire

Laura Maguire

Trace Cognitive Engineering
Tuesday March 25, 2025 8:45am - 9:00am PDT
Grand Ballroom ABGH

9:00am PDT

Safe Evaluation and Rollout of AI Models
Tuesday March 25, 2025 9:00am - 9:45am PDT
Brendan Burns, Microsoft


More and more online services and systems depend on artificial intelligence and large language models to implement core user experiences. Consequently, the safe and reliable rollout of new models and new prompts are critical parts of maintaining the reliability and performance of the overall system. However, unlike traditional systems, there is rarely a clean "working" or "broken" signal from releases. Instead the performance of new models and new prompts is based on probabilistic evaluation of the performance of the new system across many different user inputs. Any change to model or prompt may make some responses better, some responses worse, we need to be able to measure in aggregate across many experiences to determine if there is a regression that needs to be fixed or rolled back. This talk will be a hands-on introduction to approaches that we took during the development of the Azure Copilot and will both describe the problem of reliability in the world of AI models as well as real-world applications that are in use in production today.


https://www.usenix.org/conference/srecon25americas/presentation/burns
Speakers
avatar for Brendan Burns

Brendan Burns

Microsoft
Brendan Burns is Corporate Vice President for Azure Cloud Native Open Source and Management Platform. He is also a co-founder of the Kubernetes open source project. Before working at Microsoft Azure, he spent eight years working at Google where he worked on search infrastructure and... Read More →
Tuesday March 25, 2025 9:00am - 9:45am PDT
Grand Ballroom ABGH

9:45am PDT

Improving the SRE Experience for 10 Years as a Free, Open, and Automated Certificate Authority
Tuesday March 25, 2025 9:45am - 10:30am PDT
Matthew McPherrin, Internet Security Research Group


Ubiquitous HTTPS is an essential part of a secure and privacy-respecting Internet. To that end, the public benefit certificate authority Let’s Encrypt has been issuing TLS certificates free of cost in a reliable, automated, and trustworthy manner for ten years. In that time, we’ve grown to servicing over 500,000,000 websites.

In this talk we’ll dive into the history of Let’s Encrypt and share helpful context for those managing TLS certificates, as well as information about upcoming changes to Let’s Encrypt and guidance for the future. We’ll also cover how we have strived to make the working lives of SREs around the world easier, and how the SRE community has helped us in return.


https://www.usenix.org/conference/srecon25americas/presentation/mcpherrin
Speakers
avatar for Matthew McPherrin

Matthew McPherrin

Internet Security Research Group
Matthew is the technical lead of the Let's Encrypt site reliability engineering team, which runs the Let’s Encrypt Certificate Authority and Certificate Transparency logs. Previously Matthew worked on internal PKI and security infrastructure at Stripe and Square.
Tuesday March 25, 2025 9:45am - 10:30am PDT
Grand Ballroom ABGH
 
Wednesday, March 26
 

9:00am PDT

SRE & Complexification: Where Verbs and Nouns Do Battle
Wednesday March 26, 2025 9:00am - 9:45am PDT
David Woods, The Ohio State University


SRE is one proving ground on resilient performance in action (also known as SNAFU Catching). It is a critical contributor to the scientific foundations for Resilience Engineering.

A new round of growth & change is producing new complexity penalties—complexification. How will/can SRE cope as the lines of tension change? The skills & expertise to do SRE well are verb-centric—“resilience—as adaptive capacity—is a verb in the future tense.” The human push for advantage from technology change is noun-centric.

SRE is one arena where the two framings conflict given the expanding the layers and tangles of interdependencies. SRE can adapt by innovating new verb-based means to see ahead in order to anticipate, to see around in order to synchronize, and to see anew to reframe models.


https://www.usenix.org/conference/srecon25americas/presentation/woods
Speakers
avatar for David Woods

David Woods

The Ohio State University
David is a pioneer of Resilience Engineering that looks at how people adapt to cope with complexity in dynamic risky human-cyber systems including accident investigations in critical digital services, critical care medicine, aviation, energy, disaster response, military operations... Read More →
Wednesday March 26, 2025 9:00am - 9:45am PDT
Grand Ballroom ABGH

9:45am PDT

The Perverse Incentives of Reliability
Wednesday March 26, 2025 9:45am - 10:30am PDT
Katie Wilde, Snyk


Are you trying to improve reliability in your company, but coming up against it not being valued unless you're in an active SEV1? Struggling to build a reliability culture in a wider organization? Relying on heroics to keep the lights on?

This talk is for you. The reality is that, for most of us, reliability work is not extrinsically rewarded: customers won't write in about the outage you didn't have, and investors aren't impressed that your site is still up. In today's "do less with more" world, increased pressure to deliver value (read: features) often comes at the expense of building resilient systems as we race to hit ever tighter deadlines. In the face of these perverse incentives, it's no wonder that having a reliability focus isn't the norm for so many engineering cultures. There is a better way: harnessing intrinsic motivation. This talk will cover approaches, tactics and lessons learned to overcome the perverse incentive problem, and how tapping into the inherent pride, joy and hilarity of incidents can transform reliability practices.


https://www.usenix.org/conference/srecon25americas/presentation/wilde
Speakers
avatar for Katie Wilde

Katie Wilde

Snyk
Katie Wilde is an experienced engineering leader, and currently Senior Director at Snyk, and previously, VP Engineering at Ambassador Labs and Buffer. In this talk, she shares the problem of perverse incentives that make it so hard to build a culture of reliability in engineering... Read More →
Wednesday March 26, 2025 9:45am - 10:30am PDT
Grand Ballroom ABGH
 
Thursday, March 27
 

4:00pm PDT

Technical Debt as Theory Building and Practice
Thursday March 27, 2025 4:00pm - 4:45pm PDT
Yvonne Z. Lam


I will examine the connections between technical debt, housework/carework, and infrastructure in order to talk through strategies for understanding the shape of your technical debt, picking pieces to pay down, and building narratives with conceptual integrity around technical debt.


https://www.usenix.org/conference/srecon25americas/presentation/lam
Speakers
avatar for Yvonne Lam

Yvonne Lam

Yvonne plays with code, systems, books, cats, food, yarn, dirt, and boats—not all at the same time. She works on devtools, release engineering, quality, and reliability.
Thursday March 27, 2025 4:00pm - 4:45pm PDT
Grand Ballroom ABGH

4:45pm PDT

AIOps: Prove It! An Open Letter to Vendors Selling AI for SREs
Thursday March 27, 2025 4:45pm - 5:30pm PDT
Charity Majors and Fred Hebert, Honeycomb.io


SREs are not known for being eager, optimistic early adopters of shiny new technologies. We are much more likely to subject you to lengthy monologuing about all of the ways said technologies are overhyped, under-delivered, and prone to spectacular, catastrophic systems failures. Which brings us to the topic of AI.

It’s easy to be cynical when there’s this much hype and easy money flying around, but generative AI is not a fad; it’s here to stay. Which means that even operators and cynics — no, especially operators and cynics — need to get off the sidelines and engage with it. How should responsible, forward-looking SREs evaluate the truth claims being made in the market without being reflexively antagonistic? How can we help our orgs steer into change, leveraging AI technologies to help our teams ship better software, faster? And for the vendors out there using AI to try and help solve traditional SRE domain problems, how should they demonstrate that they are engaging with these problems in good faith, that they are more than just hype and snake oil?


https://www.usenix.org/conference/srecon25americas/presentation/majors
Speakers
avatar for Charity Majors

Charity Majors

Honeycomb.io
Charity Majors is the co-founder and CTO of honeycomb.io. She pioneered the concept of modern Observability, drawing on her years of experience building and managing massive distributed systems at Parse (acquired by Facebook), Facebook, and Linden Lab building Second Life. She is... Read More →
Thursday March 27, 2025 4:45pm - 5:30pm PDT
Grand Ballroom ABGH

5:30pm PDT

Closing Remarks
Thursday March 27, 2025 5:30pm - 5:35pm PDT
Program Co-Chairs: Dan Fainstein, The D. E. Shaw Group; Laura Maguire, Trace Cognitive Engineering
Speakers
avatar for Dan Fainstein

Dan Fainstein

The D. E. Shaw Group
avatar for Laura Maguire

Laura Maguire

Trace Cognitive Engineering
Thursday March 27, 2025 5:30pm - 5:35pm PDT
Grand Ballroom ABGH
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.