Loading…
Venue: Magnolia Room clear filter
Monday, March 24
 

6:00pm PDT

Welcome Get-Together
Monday March 24, 2025 6:00pm - 7:00pm PDT
Monday March 24, 2025 6:00pm - 7:00pm PDT
Magnolia Room
 
Tuesday, March 25
 

11:00am PDT

Running ML in Production
Tuesday March 25, 2025 11:00am - 12:35pm PDT
Todd Underwood, Anthropic, and Brendan Burns, Microsoft

Format: Breakout Group Discussion


Running ML systems is a major new area for many SRE organizations. This session will dive into the differences between running reliable software services in general and ML systems: infrastructure considerations, monitoring, rollouts, performance and cost management, and more.



https://www.usenix.org/conference/srecon25americas/presentation/discussion-ml-production
Speakers
avatar for Todd Underwood

Todd Underwood

Anthropic
Todd Underwood leads reliability at Anthropic, a company working to create AI systems that are safe, reliable, and beneficial to society.Prior to that he led reliability for the Research Platform at Open AI. Before that he was a Senior Engineering Director at Google leading ML capacity... Read More →
Tuesday March 25, 2025 11:00am - 12:35pm PDT
Magnolia Room

1:50pm PDT

A Guided Introduction to SRE
Tuesday March 25, 2025 1:50pm - 3:25pm PDT
Niall Murphy, Stanza, and Kurt Andersen, Clari

Format: AMA Session


Are you confused by the alphabet soup: PRRs, ICS, MTTR, o11y, SLOs, SLIs? Is SRE the same thing as DevOps? Will doing everything the SRE books say lead to success and a good night's sleep oncall?


This discussion session is the place to bring any and every question you may have about getting started as an SRE. Expect breadth rather than depth.


https://www.usenix.org/conference/srecon25americas/presentation/discussion-intro-sre
Speakers
avatar for Niall Murphy

Niall Murphy

Stanza
Niall is the CEO of Stanza Systems, has occupied various engineering and leadership roles in Microsoft, Google, and Amazon, and is the instigator of the best-selling & prize-winning Site Reliability Engineering, which he hopes at some stage to live down. His most recent book is Reliable... Read More →
avatar for Kurt Andersen

Kurt Andersen

Clari
By day, Kurt works as an infrastructure software architect at Clari. In addition, he serves on the USENIX Board and has had the pleasure to work with amazing people around the globe in the SREcon conferences. He also helps with the annual SRE survey and report that is graciously supported... Read More →
Tuesday March 25, 2025 1:50pm - 3:25pm PDT
Magnolia Room

3:55pm PDT

AMA with David Woods
Tuesday March 25, 2025 3:55pm - 5:30pm PDT
David Woods, The Ohio State University

Format: AMA Session


Dr. David D. Woods is a cognitive psychologist and systems safety expert who has spent his career studying resilience in systems and how people and machines can best work together—topics deeply relevant to SRE. Dr Woods will host a wide-ranging discussion on how systems adapt in the face of disturbances, how complex systems break down, how the human perceptual system works with user interfaces, and more.


https://www.usenix.org/conference/srecon25americas/presentation/discussion-ama-woods
Speakers
avatar for David Woods

David Woods

The Ohio State University
David is a pioneer of Resilience Engineering that looks at how people adapt to cope with complexity in dynamic risky human-cyber systems including accident investigations in critical digital services, critical care medicine, aviation, energy, disaster response, military operations... Read More →
Tuesday March 25, 2025 3:55pm - 5:30pm PDT
Magnolia Room
 
Wednesday, March 26
 

11:00am PDT

What Do SRE ICs Do? How to Build SRE Skillsets
Wednesday March 26, 2025 11:00am - 12:35pm PDT
Beth Adele Long, Adaptive Capacity Labs, and Fred Hebert, Honeycomb.io

Format: Breakout Group Discussion


The focus of this session is developing individual contributor skills in SRE. SREs do a lot of different things, including but not limited to: load testing, setting up and maintaining infrastructure services, building integration test pipelines, setting SLOs, maintaining alerts, being an incident commander, writing post incident reviews, system design for scalability, building automation, contributing code changes to core products which are aimed at increasing reliability or performance, and troubleshooting. Few engineers come to SRE with all of these skills. How, as practitioners, should we think about building skills in new areas? Does it make sense to be an SRE jack-of-all-trades, or should one specialize?


https://www.usenix.org/conference/srecon25americas/presentation/discussion-sre-ics
Speakers
avatar for Beth Adele Long

Beth Adele Long

Adaptive Capacity Labs
Beth Adele Long is a writer and engineer with wide experience building, maintaining, and repairing web systems (mostly repairing). She’s a founding member of the Resilience in Software Foundation and a Principal at Adaptive Capacity Labs.
Wednesday March 26, 2025 11:00am - 12:35pm PDT
Magnolia Room

1:50pm PDT

SRE Team Practices
Wednesday March 26, 2025 1:50pm - 3:25pm PDT
Colette Alexander, HashiCorp, and Sarah Butt, Salesforce

Format: Breakout Group Discussion


This session sets out to explore SRE at the level of the engineering team. Most engineering teams do some form of planning and goal setting—is this process different for SRE teams compared to other engineering teams? Do OKRs even make sense for SREs? How does your SRE team onboard new members?


There are several team practices specific to SRE, including (but not limited to) Production Readiness Reviews, regular production meetings, and wheel of misfortune exercises. What works and what doesn't? Why? What team-level practices are most important?


https://www.usenix.org/conference/srecon25americas/presentation/discussion-sre-team-practices
Speakers
avatar for Colette Alexander

Colette Alexander

HashiCorp
Colette has been working as an engineering leader in the software industry for 10+ years. Her obsession with learning from incidents and Resilience Engineering began while managing teams at Spotify. It eventually led her to pursue her Masters in Science at Lund University in Human... Read More →
avatar for Sarah Butt

Sarah Butt

Salesforce
Sarah is a Principal Engineer within Salesforce's Customer Centric Reliability Engineering group, where she helps lead Salesforce's Centralized Incident Response organization. She is fascinated by scale, complexity, systems thinking, and non-functional requirements— particularly... Read More →
Wednesday March 26, 2025 1:50pm - 3:25pm PDT
Magnolia Room

3:55pm PDT

Service Level Objectives
Wednesday March 26, 2025 3:55pm - 5:30pm PDT
Alex Hidalgo, Nobl9, and Cail Young, Octopus Deploy

Format: AMA Session


Service Level Objectives are a core element of many organizations' SRE strategies. SLOs seem simple, but the way in which they are constructed and used has broad consequences for organizations, and for SRE teams' success. The session hosts will lead a discussion that gets into the nitty-gritty of implementing, maintaining, and living with SLOs and error budgets.



https://www.usenix.org/conference/srecon25americas/presentation/discussion-slos
Speakers
avatar for Alex Hidalgo

Alex Hidalgo

Nobl9
Alex Hidalgo is the Field CTO at Nobl9 and author of Implementing Service Level Objectives. During his career he has developed a deep love for sustainable operations, proper observability, and using SLO data to drive discussions and make decisions. Alex's previous jobs have included... Read More →
Wednesday March 26, 2025 3:55pm - 5:30pm PDT
Magnolia Room
 
Thursday, March 27
 

9:00am PDT

Tech Debt
Thursday March 27, 2025 9:00am - 10:35am PDT
Yvonne Lam and Mike Rembetsy, Bloomberg

Format: Breakout Group Discussion


Technical debt—decisions made to deliver in the short term with costs in the longer term—has not traditionally been seen as an SRE concern, but it should be. SRE teams with responsibility for production services generally need to contend with technical debt in the software they create (monitoring, automation, and so on) as well as being on the sharp end of many of the consequences of technical debt in the services they run. What are your experiences of technical debt as an SRE? How have you managed to cope with or to reduce technical debt?


https://www.usenix.org/conference/srecon25americas/presentation/discussion-tech-debt
Speakers
avatar for Mike Rembetsey

Mike Rembetsey

Bloomberg
Michael Rembetsy is the Global Head of Network Engineering and Operations at Bloomberg. His teams are responsible for everything from the physical hardware to SRE’s who focus on the global connectivity for customers. Prior to Bloomberg, Michael was the VP of Infrastructure at E... Read More →
Thursday March 27, 2025 9:00am - 10:35am PDT
Magnolia Room

11:05am PDT

Observability
Thursday March 27, 2025 11:05am - 12:40pm PDT
Daria Barteneva, Microsoft Azure

Format: Breakout Group Discussion


This discussion will cover all forms of observability (logs, metrics, distributed traces). How should we think about the goals of observability programs and how do we know if we are achieving good outcomes?


https://www.usenix.org/conference/srecon25americas/presentation/discussion-observability
Speakers
avatar for Daria Barteneva

Daria Barteneva

Microsoft Azure
Daria is a Principal Site Reliability Engineer in Observability Engineering in Azure. With a background in Applied Mathematics, Artificial Intelligence, and Music, Daria is passionate about machine learning, diversity in tech, and opera. In her current role, Daria is focused on changing... Read More →
Thursday March 27, 2025 11:05am - 12:40pm PDT
Magnolia Room

1:55pm PDT

Open Unconference on SRE
Thursday March 27, 2025 1:55pm - 3:30pm PDT
Blake Bisset and Robert Barron, IBM

Format: Unconference Session


If you were disappointed to not see a particular topic on the discussion track agenda then this is the session for you.


Come to this session—which will be run using the Open Spaces Technology unconference format—and propose your topic. The four topic areas with the most interest will be selected for discussion in breakout groups.



https://www.usenix.org/conference/srecon25americas/presentation/discussion-open-unconference
Speakers
BB

Blake Bisset

Blake Bisset got his first legal tech job at 16, long enough ago that he’s entitled to make shakeyfists while shouting, "Get off my LAN!" He’s contributed to 4 major tech books, 4 cool tech start-ups, and 4 “big tech” companies (not at all in that order), but has been a regular... Read More →
avatar for Robert Barron

Robert Barron

IBM
Robert Barron is an SRE Architect in the office of the IBM CIO where he enjoys helping others solve problems even more than he enjoys solving them himself. Robert has over 20 years of experience in all kinds of things ending in *-ops and *-ility and is still happiest when learning... Read More →
Thursday March 27, 2025 1:55pm - 3:30pm PDT
Magnolia Room
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.