Catchpoint CEO and co-founder Mehdi Daoudi believes observability captures some important ideas around the way we currently we monitor digital systems. But while it's a good and useful concept, he takes issue with the way many analysts, technologists, and vendors seem to use it.
Not that this is a new problem. In Through the Looking Glass, Lewis Carroll wrote "'When I use a word,' Humpty Dumpty said in rather a scornful tone, 'it means just what I choose it to mean – neither more nor less.'"
Daoudi's point is that such people often paint a far too narrow a picture of what observability is or ought to be; usually one that suits either their intellectual position, or the tools they use or are trying to sell.
|
If customers simply accept one of those narrow definitions, there can be real-world consequences. If they don't understand exactly what they are and are not observing, they may be saddling themselves with some serious blind spots.
In Daoudi's opinion, businesses need to look at observability starting with from the user experience but including every link in the chain of end-to-end digital business. This, he says, allows them to create a more holistic, useful, and actionable picture of the health of their digital systems.
Monitoring digital performance as experienced by users is critical, because a bad experience is bad for the brand, bad for revenue, and bad for the reputation of any company.
So what does ' digital experience observability' really mean?
"We define digital experience observability as a methodology and tooling that eliminates blind spots and prevents outages that negatively impact your users' digital experience. Our observability platform transforms observed telemetry data from the entire digital delivery chain into preventative actions," he says.
This observability model is particularly suitable for modern digital systems.
"Today's cloud-native applications are composed of thousands of distinct microservices running across multiple locations, blinking on and off all the time. You couldn't directly pull metrics from every relevant component even if you wanted to. So, taking an outcome-focused approach is the right way to go.
"The problems arise, however, when we forget the most important outcome of all: the customer experience.
"Too many businesses rely solely on inside-out approaches to observability. They measure 'outcomes' of their digital infrastructure as detected by application performance monitoring (APM) systems, not as experienced by users.
"APM tools provide important insight into the health of application environments, and you should definitely use them. But they don't capture everything users experience when coming to your applications from the outside world, with all its real-world complications."
Those real-world complications include issues can crop up in a user's web browser or device, local or regional networks, public clouds, domain name systems (DNS), or any of a thousand third-party services that contribute to websites.
"Think of services providing shopping carts, credit card processing, APIs to calculate shipping costs, and many others. Any of these could slow down or fail under peak holiday shopping loads and quickly translate to real dollars lost," Daoudi warns.
What's changed in observability in the last few years, and why?
"Several factors have driven an increased need for observability," he says.
"The pandemic added rocket fuel to digital transformation plans, to the point where we expect to see 23% growth across public cloud and software as a service offerings in 2022.
"Then there's the growth of remote-access strategies, such as the secure access service edge (SASE) architecture that combines security and optimised network performance, driving a need for independent observability that preserves great user experience regardless of location.
"Meanwhile, the astonishing growth of 5G and multi-access edge compute means that people need to continually better understand and support user experience at the edge."
Organisations must somehow find a way to see across all these layers, Daoudi counsels.
What should organisations be measuring, and how can they make those measurements?
The traditional approach is to use monitoring tools to uncover problems by collecting metrics from various single domain sources and comparing them against predefined thresholds, he says.
For example, storage levels shouldn't exceed a certain threshold; if they do, an alert is sent so IT can investigate what's happening.
In contrast, observability takes a more holistic approach that adds context by examining multiple outputs of the system to infer system health.
"For example, suppose you were trying to gauge a car's condition. In that case, traditional monitoring might track engine temperature or idle speed RPMs, while observability would examine things like power output, engine leaks, transmission noise, emissions, and fuel efficiency to infer the car's overall health from an overview perspective. "
Thus observability provides answers about the health and performance of digital assets.
"This means you should be obtaining telemetry data from across the end-to-end digital service supply and delivery chain," says Daoudi.
"That includes telemetry such as logs, infrastructure metrics, traces and APIs from your application hosting environment, as well as active and passive observation at the point of consumption – that is, outputs from customer browsers and worker devices.
"It also must include active network observability (connectivity, reachability, BGP routing), as well as continuous active observation of all the disparate services — content delivery networks, DNS services, cloud-based security services — and every other cloud and web service that contributes to the user experience."
Can there be a conflict between the experience of external customers and of internal employees? If so, what can you do about it?
"We believe a holistic approach for improving the total experience – that of external customers and internal employees – may transform business outcomes. Businesses should balance where the supporting activities for these two groups are located with the impact of such a conflict for a given time or use case," explains Daoudi.
For example, an organisation may have adopted a cloud-native architecture (versus a traditional datacenter) to deliver customer experiences. But employees may still be using physical, IT-assigned devices versus desktops as a service, even though a cloud-native approach may be more resilient. The best approach, he says, is to bring both to the same level of maturity.
Daoudi concludes by pointing out that "Total experience is a strategy for creating superior shared experiences by interlinking the user experience, customer experience, employee experience, etc. It is about more than improving the experience of one constituent: it improves experiences at the intersection of multiple constituents to achieve a transformational business outcome."