Loading…
Type: Track 2 clear filter
Tuesday, March 25
 

11:00am PDT

Tackling Slow Queries: A Practical Approach to Prevention and Correction
Tuesday March 25, 2025 11:00am - 11:45am PDT
Kurni Famili and Brad Feehan, Shopify


Slow queries can cripple the reliability of production systems, leading to performance bottlenecks and user dissatisfaction. This session explores a dual-component framework for tackling slow queries, covering preventive measures integrated into CI pipelines and corrective actions utilizing production monitoring. Attendees will gain actionable insights to boost their systems’ reliability by identifying and resolving slow queries effectively.


https://www.usenix.org/conference/srecon25americas/presentation/famili
Speakers
avatar for Kurni Famili

Kurni Famili

Shopify
Kurni Famili is a Senior Site Reliability Engineer at Shopify, originally from Indonesia and now living in Singapore. They have a broad interest in system reliability, with a particular focus on databases and observability. At Shopify, they work alongside teams to improve infrastructure... Read More →
avatar for Brad Feehan

Brad Feehan

Shopify
Brad Feehan is a Senior Site Reliability Engineer at Shopify, currently based in Melbourne, Australia. With over a decade of experience in high-traffic web applications, they have a deep understanding of every layer of the tech stack. Starting in full-stack web development, they transitioned... Read More →
Tuesday March 25, 2025 11:00am - 11:45am PDT
Grand Ballroom GH

11:50am PDT

The Search for Speed
Tuesday March 25, 2025 11:50am - 12:35pm PDT
Scott Laird


What do you do when you're new to a service and all you know is that you're spending huge amounts of money on it and no one is happy with the service's performance? You use science, of course!

The speaker joined a team with a severe OpenSearch performance problem and applied basic monitoring principles, built models to understand the problem space, conducted experiments to understand what was happening under the hood of a managed service, and then halved the system's latency, cut costs by more than half, and left the team with a framework for further improvement.


https://www.usenix.org/conference/srecon25americas/presentation/laird
Speakers
avatar for Scott Laird

Scott Laird

Scott worked as an SRE at Google for 17 years, working on many products including Chrome, Google Docs, Calendar, and storage in Google Cloud, but never search. More recently he worked as a part of Figma's Production Engineering team.He lives in the Seattle area and holds strong opinions... Read More →
Tuesday March 25, 2025 11:50am - 12:35pm PDT
Grand Ballroom GH

1:50pm PDT

Live, Laugh, Log
Tuesday March 25, 2025 1:50pm - 2:35pm PDT
Paige Cruz, Chronosphere


Telemetry pipelines are the unsung heroes that shepherd data from applications and infrastructure to your observability and monitoring systems. It’s often up to SRE to ensure these pipelines are in tip-top shape, allowing logs to flow freely. However, a lot can go awry on the journey a log takes—from source issues and bad data formatting to misconfigured processing steps, congestion and under-provisioning. Buckle up as we dive into operating and monitoring Fluent Bit, helping you live, laugh, and log reliably.


https://www.usenix.org/conference/srecon25americas/presentation/cruz
Speakers
avatar for Paige Cruz

Paige Cruz

Chronosphere
Paige Cruz is passionate about cultivating sustainable on-call practices and bringing folks their aha moment with observability. Currently a Principal Developer Advocate at Chronosphere, she got her start as a software engineer at New Relic before switching to SRE holding the pager... Read More →
Tuesday March 25, 2025 1:50pm - 2:35pm PDT
Grand Ballroom GH

2:40pm PDT

Distributed Tracing in Action: Our Journey with OpenTelemetry
Tuesday March 25, 2025 2:40pm - 3:25pm PDT
Chris Detsicas, Cisco ThousandEyes


Join us as we dive into our journey with Distributed Tracing, leveraging OpenTelemetry and Istio in a dynamic microservices landscape. An internal Observability team embarked on a mission to empower engineers with deep application insights.

This talk encapsulates our journey, challenges encountered, and critical decisions made during the adoption of OpenTelemetry tracing. We'll discuss context propagation hurdles, the significance of automatic instrumentation, and importance of testing. Furthermore, we will provide an overview of our pipeline implementation and share key examples of how enabling our tracing solution has provided critical insights, helped us troubleshoot issues more effectively, and enhanced our understanding of application performance.


https://www.usenix.org/conference/srecon25americas/presentation/detsicas
Speakers
avatar for Chris Detsicas

Chris Detsicas

Cisco ThousandEyes
Chris Detsicas is a Lead SRE within the internal Observability team at ThousandEyes (part of Cisco) where he builds and maintains logging, metrics and tracing systems to empower ThousandEyes engineers with deep insights on their infrastructure and applications. He has 10+ years of... Read More →
Tuesday March 25, 2025 2:40pm - 3:25pm PDT
Grand Ballroom GH

3:55pm PDT

Lies Programmers Believe about Memory
Tuesday March 25, 2025 3:55pm - 4:40pm PDT
Chris Down, Meta


How does kernel memory management actually work? The Linux kernel provides a number of abstractions on top of physical memory, which, like most abstractions, can either be a blessing or a curse, especially when it comes to understanding application behavior. Some of these exist in conjunction with the hardware, like translation lookaside buffers, page tables, and the like, and some of them are Linux's own internal abstractions over memory, like different classes of memory within the operating system itself (with bonus special and often misunderstood properties).


Join Chris Down, a kernel engineer who works on the Linux memory management subsystem, as we go over things like the CPU's memory management internals, pages, the inner workings of virtual memory, and the complex tradeoffs made during modern memory management. Along the way, we will demystify the kernel and CPU behaviors around memory, go over how this might actually affect you as an SRE, and hopefully enable you to introspect and build more reliable systems as a result.


https://www.usenix.org/conference/srecon25americas/presentation/down
Speakers
avatar for Chris Down

Chris Down

Meta
Chris Down is an engineer on Meta's Kernel team, based in London. He works on memory management within the kernel, especially cgroups, and is also a maintainer of the systemd project. Inside Meta, he is responsible for debugging and resolving major production issues, helping streamline... Read More →
Tuesday March 25, 2025 3:55pm - 4:40pm PDT
Grand Ballroom GH

4:45pm PDT

“On-Call Is Ruining My Life” and Other Tales about Holding the Pager as an SRE
Tuesday March 25, 2025 4:45pm - 5:30pm PDT
Cory Watson


There’s no other part of SRE life that evokes such a strong reaction as being on-call. From the fear and anticipation of your first shift to the white-knuckle drama of a total system outage and the joy and satisfaction of debugging a particularly thorny issue - holding the pager is as much a human experience as a technical one. Let's talk about it!

We've done some surveys, pored over the literature, marinated in our experiences and have some findings. What models are in use? How do we feel about this work? What impact does it have? Can we do better? Will I get a pony? Ok, maybe not the last one.

I'll present some provocative findings that question the status quo around on-call and suggest some experiments you can take back and and test out. Maybe there will be a pony?


https://www.usenix.org/conference/srecon25americas/presentation/watson
Speakers
avatar for Cory Watson

Cory Watson

Cory Watson is an engineer and founder. Cory transitioned to a focus on reliability and observability as an early SRE at Twitter, founded the observability team at Stripe, and spent time at vendors SignalFx and Splunk. He is a strong voice in the observability community, through OSS... Read More →
Tuesday March 25, 2025 4:45pm - 5:30pm PDT
Grand Ballroom GH
 
Wednesday, March 26
 

11:00am PDT

Learning from Incidents at Scale; Actually Doing Cross-Incident Analysis
Wednesday March 26, 2025 11:00am - 11:45am PDT
Vanessa Huerta Granda, Enova


For a few years we have discussed this idea of Learning from Incidents that encourages folks to deeply understand an incident through a thorough, in-depth investigation of how it came to be. I personally have led these investigations, written about them, and coached folks on them and while I stand by this process I have also seen how difficult it is to scale this process.

In this talk I will describe how my team (resiliency engineering) has been able to leverage our incident review program to learn from incidents at scale. How we’ve been able to analyze a universe of incidents broken out into quarters, years, products, and technologies and gain insights and make recommendations to improve our sociotechnical systems.


https://www.usenix.org/conference/srecon25americas/presentation/granda
Speakers
avatar for Vanessa Huerta Granda

Vanessa Huerta Granda

Enova
Vanessa is a Technology Manager for Resilience Engineering at Enova. Previously she worked at Jeli.io helping companies make the most of their incidents and has spent the last decade focusing on Production Incident processes, learning from incidents, and handling Major Incidents as... Read More →
Wednesday March 26, 2025 11:00am - 11:45am PDT
Grand Ballroom GH

11:50am PDT

Running DRP Tabletop Exercises
Wednesday March 26, 2025 11:50am - 12:35pm PDT
Josh Simon, University of Michigan


A disaster recovery plan (DRP) documents policies and detailed procedures for recovering your organization's critical technology infrastructure, systems, and applications after a disaster. Hopefully you have DRPs for your organization, but how complete are they really, and how and how often do you test them?

In this talk, we'll help you get a better understanding of what a DRP is and contains, as well as why it's important to write, test, and maintain service-specific DRPs and affiliated documentation. We'll talk about how we're developing and using collaborative discussion-based thought experiments to test our DRPs, including things you should and shouldn't do when you write and test your own. You may even get some insights on how to design your own services for reliability and recovery!


https://www.usenix.org/conference/srecon25americas/presentation/simon
Speakers
avatar for Josh Simon

Josh Simon

University of Michigan
Josh is a senior systems administrator with over 30 years of experience across industry and higher education. His areas of expertise include systems administration, project management, technical writing, and facilitation. Among his many roles and responsibilities is coordinating his... Read More →
Wednesday March 26, 2025 11:50am - 12:35pm PDT
Grand Ballroom GH

1:50pm PDT

Handling the Largest Domains Migration, Ever!
Wednesday March 26, 2025 1:50pm - 2:35pm PDT
Franklin Angulo and Divya Kamat, Squarespace


Domains remain a critical part of web infrastructure, and an essential piece of the online presence of people and businesses. In 2023, Squarespace acquired the assets behind the Google Domains business, including more than 10 million domains. Learn about the challenges of executing a migration at a scale not seen before in the domain industry.


https://www.usenix.org/conference/srecon25americas/presentation/angulo
Speakers
avatar for Franklin Angulo

Franklin Angulo

Squarespace
Franklin Angulo currently leads the product & engineering teams within the Squarespace Domains organization. Before this role, he shaped the technical vision at Squarespace as its Chief Architect, built teams to scale the backend engine and data centers that power the millions of... Read More →
avatar for Divya Kamat

Divya Kamat

Squarespace
Divya Kamat is an accomplished engineering leader and currently heads the engineering teams within the Squarespace Domains organization. Since joining Squarespace in 2018 as an engineer, Divya has played a pivotal role in the growth and evolution of the Domains team. She was a key... Read More →
Wednesday March 26, 2025 1:50pm - 2:35pm PDT
Grand Ballroom GH

2:40pm PDT

Taming the Beast: Understanding and Harnessing the Power of HTTP Proxies
Wednesday March 26, 2025 2:40pm - 3:25pm PDT
Guillaume Quintard, Varnish Software


Explore the often-overlooked power of HTTP and reverse-proxies in modern SRE and DevOps workflows.

Starting with a fresh perspective on HTTP—its simplicity and quirks—the session delves into how reverse-proxies enhance observability, performance, and resilience. Attendees will learn how proxies can serve as invaluable tools for debugging, traffic manipulation, and active mitigation during production incidents.

With a focus on actionable insights, the talk includes code snippets, real-world examples, and guidance on leveraging tools like OpenTelemetry to equip SREs with practical strategies to manage complex systems effectively.


https://www.usenix.org/conference/srecon25americas/presentation/quintard
Speakers
avatar for Guillaume Quintard

Guillaume Quintard

Varnish Software
Guillaume Quintard is a systems programming and performance optimization expert, bringing years of experience to the tech industry. A passionate contributor to open-source projects, Guillaume excels in crafting high-performance software solutions and advancing system architecture... Read More →
Wednesday March 26, 2025 2:40pm - 3:25pm PDT
Grand Ballroom GH

3:55pm PDT

Please Give Me Back My Network Cables! On Networking Limits in AWS
Wednesday March 26, 2025 3:55pm - 4:40pm PDT
Steffen Gebert and Miklos Tirpak, emnify


How much is “up to 10 Gbps” for an EC2 instance? And what happens, if packets are smaller or fragmented? Over the years of running our mobile core’s network functions on AWS, we learned – the hard way – about numerous network limits. Many of them are (in the meantime) documented, but some are not.

In this presentation, we share our horror stories on what kept us awake at night. To make you better informed, we will explain limits such as packets per second and connection tracking and how those will affect your network traffic, once they are exceeded. We share, how you can (sometimes) monitor your remaining quotas, or at least how you can identify the reason, why your applications go haywire.

Finally, we highlight a couple of cases, where your next incident could be just a side note in the documentation.


https://www.usenix.org/conference/srecon25americas/presentation/gebert
Speakers
avatar for Steffen Gebert

Steffen Gebert

emnify
Before switching into his new role, Steffen used to lead the infrastructure team at emnify, a mobile virtual network operator (MVNO) running custom-built mobile core networks for the Internet of Things on Amazon Web Services. His technical main interest is misusing AWS networking... Read More →
avatar for Miklos Tirpak

Miklos Tirpak

emnify
Miklos works with the Packet Gateway team at emnify as an engineering manager on developing high-performance applications for packet processing with cutting-edge technologies. While such network applications are running on Amazon Web Services, packet per second rate and high reliability... Read More →
Wednesday March 26, 2025 3:55pm - 4:40pm PDT
Grand Ballroom GH

4:45pm PDT

OpenTelemetry Semantic Conventions and How to Avoid Broken Observability
Wednesday March 26, 2025 4:45pm - 5:30pm PDT
Dinesh Gurumurthy, Datadog Inc., and Laurent Querel, F5


The OpenTelemetry community has introduced Semantic Conventions - a defined schema that brings consistent meaning to telemetry data, defining everything from span names and metric instruments to attribute types and valid values. Semantic Conventions standardize naming across your codebase, libraries, and platforms, ensuring smooth data flow and better insights. With these benefits come drawbacks - namely that Semantic Conventions can and will change. Join us to learn how Datadog was impacted when changes to HTTP and Deployment Semantic Conventions caused disruptions for our clients. To fix these problems, Datadog came together with the community to develop the Schema Processor - a solution built to handle these changes without painful outages.


https://www.usenix.org/conference/srecon25americas/presentation/gurumurthy
Speakers
avatar for Dinesh Gurumurthy

Dinesh Gurumurthy

Datadog Inc
Dinesh Gurumurthy is a Staff Engineer at Datadog and the founding leader of the company’s OpenTelemetry team. Last year, Dinesh led the initiative to embed the OpenTelemetry collector with the Datadog Agent. He is also highly involved in the OpenTelemetry community, contributing... Read More →
avatar for Laurent Querel

Laurent Querel

F5
Laurent Querel is a Senior Director and Distinguished Engineer at F5, focusing on observability and data processing. He is an enthusiastic supporter of open source and currently co-maintains two projects within the OpenTelemetry community: OTEL Weaver, a tool for managing and controlling... Read More →
Wednesday March 26, 2025 4:45pm - 5:30pm PDT
Grand Ballroom GH
 
Thursday, March 27
 

9:00am PDT

Fully Automated HW SKU Selection System to Optimize Apache Pinot’s Cost-to-Serve at LinkedIn
Thursday March 27, 2025 9:00am - 9:45am PDT
Jia Guo, Yifan (Sabrina) Zhao, and Dino Occhialini, LinkedIn


Join us for this session to learn more about how cost-to-serve was optimized by nearly 50% for Apache Pinot OLAP Database's production fleet of ~14K machines at LinkedIn.

The nature of OLAP workloads running in LinkedIn on Pinot have diverse characteristics in terms of:


  • Varying workload demand (SLOs as low as P99 query latency < 100ms at 100K read QPS).

  • Varying cost / resource usage (CPU, memory, IO) of SQL queries.

  • Varying dataset sizes (clusters serving data from as low as 500GB to as high as 2PB).


The talk will go into details of the core cost optimization algorithm that considers varying factors to recommend an optimal SKU.


  • Multiple SKU Profiles

  • Low-overhead mechanisms to collect high cardinality profiling data from production clusters

  • Resource constraints (CPU, Memory, Disk IOPS, Throughout etc)


The system has been built with the goal of supporting "Multiple SKUs" effectively -- both in terms of cost optimization and keeping operational overhead minimum (fully automated). Through our talk, we will go into the details of all the infrastructure pieces we have built to deliver the solution in a generic fashion.

We will further discuss how this has been integrated this into our day-to-day operational machinery.


https://www.usenix.org/conference/srecon25americas/presentation/guo
Speakers
avatar for Jia Guo

Jia Guo

LinkedIn
Jia is a Senior Software Engineer at LinkedIn, a committer for Apache Pinot. Jia focuses on making Pinot Fault-Tolerant and cost-effective. He has contributed across different areas of Pinot ranging from OLAP engine, indexing, fault tolerant shard placement to several performance... Read More →
avatar for Yifan (Sabrina) Zhao

Yifan (Sabrina) Zhao

LinkedIn
Sabrina is a Software Engineer at LinkedIn and a contributor for Apache Pinot. Sabrina has contributed features like SQL Pagination, availability improvements for massive multi-tenant clusters, OLAP SQL enhancements and fault-tolerant shard placement.
avatar for Dino Occhialini

Dino Occhialini

LinkedIn
Dino is a Staff Software Engineer at LinkedIn and a contributor to Apache Pinot. Dino has been a strong SRE Leader for the Pinot team at LinkedIn. Dino has made many noteworthy contributions towards improving Pinot's operational excellence, resiliency, Site-Up, provisioning and usability... Read More →
Thursday March 27, 2025 9:00am - 9:45am PDT
Grand Ballroom GH

9:50am PDT

Production Engineering When Trading Billions of Dollars a Day
Thursday March 27, 2025 9:50am - 10:35am PDT
Pedro Flemming, Jane Street


How do you build reliable, maintainable and performant systems that trade billions of dollars every day in financial markets across the globe?

When your software has near-unlimited access to your bank account, every single message counts. When nanoseconds can determine whether or not you make or lose money, the physical location of your server within the data center matters. Speedy alerting and incident response have a direct and measurable impact on the PnL.

This talk will lift the lid on the beating heart of a major trading firm, and offer insights into the day-to-day operations, with a touch of “when things go wrong”.


https://www.usenix.org/conference/srecon25americas/presentation/flemming
Speakers
avatar for Pedro Flemming

Pedro Flemming

Jane Street
Pedro has been a Software Engineer at Jane Street for over 7 years. He has worked on systems that directly facilitate trading of financial instruments of various shapes over his entire time there. He has spent extensive time monitoring these systems live, reacting to incidents, and... Read More →
Thursday March 27, 2025 9:50am - 10:35am PDT
Grand Ballroom GH

11:05am PDT

Securing Distributed Cache: Achieving Secure-by-Default with Key Challenges & Insights
Thursday March 27, 2025 11:05am - 11:50am PDT
Akashdeep Goel, Sriram Rangarajan, and Samuel Fu, Netflix Inc


In this session, we'll discuss a distributed caching system used at Netflix in multiple regions on a public cloud, handling 400 million requests per second and managing 14 petabytes of data. We'll focus on the intricacies of securing this system, including certificate lifecycle management, spurious policy lookup calls, and securing proxy calls for polyglot clients. We will walk you through our debugging journey with tools like CPU profiling and memory dumps, share key takeaways, and demonstrate how these techniques can be applied in any organization. This session will provide valuable lessons on retrofitting high-leverage systems for security compliance and executing global-scale rollouts effectively.


https://www.usenix.org/conference/srecon25americas/presentation/goel
Speakers
avatar for Akashdeep Goel

Akashdeep Goel

Netflix
Akashdeep Goel is a Senior Software Engineer at Netflix working on distributed systems handling large scale caching deployments for both streaming and gaming workloads across Netflix. Prior to this, Akashdeep was working on a distributed control plane at Azure CosmosDB (Microsoft... Read More →
avatar for Sriram Rangarajan

Sriram Rangarajan

Netflix
Sriram Rangarajan is a Senior Software Engineer at Netflix, focusing on caching infrastructure. Previously, he worked on ad servers and search functionalities at Unity Technologies and Kamcord, and managed backend solutions at Yahoo and Hewlett Packard. Sriram holds a Master's degree... Read More →
avatar for Samuel Fu

Samuel Fu

Netflix
Samuel Fu is a Software Engineer at Netflix working on distributed systems that help enable caching at scale, supporting both VOD and live streaming use cases. Prior to Netflix, Samuel worked on realtime streaming feature pipelines at Lyft, enabling features such as driver bonuses... Read More →
Thursday March 27, 2025 11:05am - 11:50am PDT
Grand Ballroom GH

11:55am PDT

Cattle vs. Pets - A Cost-Effective Elasticsearch Architecture to Scale-Out Beyond Petabytes
Thursday March 27, 2025 11:55am - 12:40pm PDT
Leonardo Antônio dos Santos, Workday, Inc.


Managing Elasticsearch at tens of petabyte scale requires innovative approaches to overcome the limits of traditional single-cluster designs. In this talk, we introduce a scalable, cost-effective multi-cluster architecture that handles trillions of indexed logs monthly while reducing operational complexity. By shifting to a "Cluster of Clusters" design, we optimize ingestion, search, and cross-cluster search traffic using a centralized management cluster and standardized data clusters.

Key highlights include leveraging a custom cluster health service based on the USE Method for intelligent query routing, implementing real-time auditing for problematic query detection, and automating rate-limiting for high-demand users. Attendees will learn how these strategies cut compute costs by 57%, achieved significant storage savings, and enhanced scalability and migration efficiency.

This session provides practical insights, benchmarks, and real-world examples to help organizations sustainably optimize Elasticsearch while maintaining performance and reducing costs — which is ideal for those overseeing large-scale log data or anticipating Elasticsearch growth.


https://www.usenix.org/conference/srecon25americas/presentation/santos
Speakers
avatar for Leonardo Antônio dos Santos

Leonardo Antônio dos Santos

Workday, Inc.
Leonardo Dos Santos is a Senior Distributed Systems Engineer at Workday, specialized in building, maintaining and scaling large distributed systems. With extensive experience managing systems spanning petabytes and thousands of nodes, Leonardo has led large-scale architecture transformations... Read More →
Thursday March 27, 2025 11:55am - 12:40pm PDT
Grand Ballroom GH

1:55pm PDT

Network Flow Data in the Cloud
Thursday March 27, 2025 1:55pm - 2:15pm PDT
Steve Dodd, Slack


Everything old is new again. Or rather, everything you thought was old is as relevant to today’s distributed service-oriented architecture as it was in the days of manual OSPF metric tuning. Traditional network engineering techniques are based on discrete math – namely, graph theory. A network graph provides a visual and quantitative foundation for analyzing network behaviors to optimize data flow, routing, and resilience in complex topologies. Huge benefits await those able to apply these lost arts to large-scale cloud infrastructure. In this talk, we’ll review those traditional methods, then apply them. We’ll explore how to build network traffic attribution on a per-service level — all without spending piles of money on vendor logging solutions.


https://www.usenix.org/conference/srecon25americas/presentation/dodd
Speakers
avatar for Steve Dodd

Steve Dodd

Slack
Steve is a Staff Software Engineer for the Demand Engineering team at Slack based in Hailey, Idaho. The Demand Engineering team enables fast and reliable delivery of Slack to our 12M+ globally distributed daily active users.Outside of work Steve enjoys rock climbing, skiing, and tinkering... Read More →
Thursday March 27, 2025 1:55pm - 2:15pm PDT
Grand Ballroom GH

2:20pm PDT

OLTP SQL Database Query Tracing and Linting
Thursday March 27, 2025 2:20pm - 2:40pm PDT
Wei Li and Xiaotong Jiang, Databricks


The proposed talk suggested a way how we can annotate the query and trace a query from client side to the database server side. In addition, we can effectively aggregate the database server usage from different client side dimensions, like RPC, tenants, etc. This has been proven to be effective in handling client initiated incidents. On top of this, the query tracing system can be used to analyze the query behavior in the system to facilitate the large scale data migration operations.


https://www.usenix.org/conference/srecon25americas/presentation/li
Speakers
avatar for Wei Li

Wei Li

Databricks
Wei is a software engineer at Databricks and has been working in many storage system in the career and sql / nonsql databases and other distributed storage systems.
avatar for Xiaotong Jiang

Xiaotong Jiang

Databricks
Xiaotong is a software engineer at Databricks and has been working in Databricks's OLTP systems, focused on data migration system.
Thursday March 27, 2025 2:20pm - 2:40pm PDT
Grand Ballroom GH

2:45pm PDT

“How’s the App Doing?” Bringing Mobile Into Your Reliability Picture
Thursday March 27, 2025 2:45pm - 3:05pm PDT
Hanson Ho and David Rifkin, Embrace


Do you include telemetry from mobile apps when assessing the health and performance of your application? If not, do you know what you might be missing?

Like when users can’t connect to your servers because their network connection is poor, or something failed on the device before a request could be sent to complete an order? And what about everything
in the app before creation of a network request – context that's hard or impossible to derive from the request itself – and can explain WHY requests are so slow, but only in Japan?

How are you thinking about the telemetry that comes from your mobile app? Learning to make sense of the gaps, and work around them, is the best path to reliable mobile applications. We’ll discuss how user experience is the best anchoring mechanism for mobile observability, and how reliability ultimately is in the eyes of the app-holder.


https://www.usenix.org/conference/srecon25americas/presentation/ho
Speakers
avatar for Hanson Ho

Hanson Ho

Hanson Ho's niche is mobile observability and performance, an odd passion he developed while working at Twitter as Android Performance Tech Lead. He is now at Embrace, hoping to bring true observability 2.0 to mobile apps everywhere, one device at a time.
avatar for David Rifkin

David Rifkin

Embrace
David Rifkin is a developer relations engineer at Embrace, a mobile developer by trade, always an educator at heart. He has built iOS applications in a variety of settings and team sizes. OpenTelemetry components have become his new Legos.
Thursday March 27, 2025 2:45pm - 3:05pm PDT
Grand Ballroom GH

3:10pm PDT

From HAR to OpenTelemetry Trace: Redefining Browser Observability
Thursday March 27, 2025 3:10pm - 3:30pm PDT
Antonio Jimenez, Cisco ThousandEyes


Have you heard about HTTP Archive (HAR) files and wondered how you could leverage this data for deeper insights into your web applications?

Imagine analyzing your page load request data as OpenTelemetry traces in your favorite observability backend. In this talk, we will explore the lessons learned from transforming HAR into an OpenTelemetry trace and streaming it to Jaeger.

You'll gain insights into the process of converting HAR data into spans following OpenTelemetry semantic conventions and learn about the architecture we used to send these traces to any observability backend via the OpenTelemetry collector.


https://www.usenix.org/conference/srecon25americas/presentation/jimenez
Speakers
avatar for Antonio Jimenez

Antonio Jimenez

Cisco ThousandEyes
Antonio is a Tech Lead Software Engineer at Cisco ThousandEyes, specializing in observability to ensure our customers can effectively monitor their products. His recent work involves using OpenTelemetry to stream telemetry data, enhancing network visibility and performance for our... Read More →
Thursday March 27, 2025 3:10pm - 3:30pm PDT
Grand Ballroom GH
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.