Observability/APM is being Reinvented Again

The Observability and APM markets have existed for a long time and have undergone periodic reinventions driven by changes in the technology stack and how customers build, deploy and run applications in production.

Such a reinvention is happening again.

The History of Reinventions

Here is a brief history of the major reinventions of the APM and Observability markets:

The APM market was invented by Wily in 1998 around the emergence of Java as an enterprise capable programming language and J2EE applications servers as enterprise capable application run time environments. CA bought Wily in 2006.
The APM market was reinvented in the 2008 timeframe by AppDynamics, Dynatrace and New Relic to address the first generation of distributed applications (2-Tier and N-Tier, more languages, and more run time environments).
In around 2020, the term Observability was introduced. The idea was to comprehensively and frequently monitor the entire stack so as to allow customers to both find and fix anticipated problems and unanticipated problems (unknown unknowns). At the same time microservices, Kubernetes, CI/CD, even more languages, and even more runtimes became parts of the stack. Cloud became the dominant deployment environment for new applications.

The Current (Ongoing) Reinvention

For the last couple of years, the Observability and APM markets (and what is left of the infrastructure monitoring space) have been undergoing yet another reinvention. This time the driving dynamics are:

Because the cloud has become the dominant deployment platform, infrastructure monitoring has been redefined as monitoring the cloud platform and the software that supports the applications, and has largely ceased to exist as a category independent of Observability and APM.
The diversity in the stack continues to increase driven by the ongoing quest to make software development easier and more productive.
The rate of innovation in the stack continues to increase driven by the same dynamics that drive increases in diversity.
The quest to combine all of the types of Observability data (metrics, logs, traces, dependencies, configuration changes) into one database has lead the major Observability vendors (Datadog, Dynatrace, New Relic, IBM/Instana, Cisco/Splunk) to conclude that they could not construct their back ends out of commodity open source databases, and that they had to build or buy their own back ends. New Relic (NRDB), Dynatrace (Grail), Datadog (Husky), IBM/Instana (BeeInstant) and Splunk (Omnition) are all examples of this trend.
However running a database that relates these items to each other at ingest time (while hugely valuable), has turned out to be be costly to run in the cloud at scale, which has in turn caused Observability to be expensive for customers to run at scale.
Partially in result to the above cost pressures Open Telemetry (despite its limitations in functionality – please see Open Telemetry – Destined to Disappoint for more on this) has become a de-facto standard for monitoring applications or at least the starting point to monitor applications for many enterprises. Many other open source projects and products have made their mark on the Observability market as well.
New vendors like Chronosphere have entered the Observability market with a focus upon the cost of running an Observability solution in the cloud at scale.
Application security has become a feature of leading Observability platforms. Security vendors (like Zscalar) have started to deliver Observability functionality.
Many other things that used to be standalone products or categories are now part of Observability Platforms. This includes Synthetic Monitoring, Kubernetes Monitoring, Digital Experience Monitoring, Service Management, Observability Pipelines, and Business Analytics.
First generation AI functionality has become a feature of leading Observability Platforms. As a result AIOps has ceased to exist as a standalone category. Please see AIOps is Dead for more detail on this.
It is now possible to have software develop software. Specifically it is possible to use Generative AI to write code. This falls into the category of yet another innovation that will be widely used and deployed before it is well understood how to support this innovation in production. However at the end of this road lies the potential for Observability, Generative AI and CI/CD to combine to offer a closed loop process to identify problems in code in production, identify what must be done to address the problem, tell the AI what changes it must make to the code it has already created, use CI/CD to manage the process of pushing that fix into production, and then verify that the change actually fixed the problem.

Impact of the CrowdStrike Debacle upon Observability

On July 19, 2024 CrowdStrike pushed a bad update to customers of its SaaS security solution, causing Windows computers worldwide to freeze – sometimes into a Blue Screen of Death (BSOD) state. This has given rise to to following questions for the Observability vendors:

Since Observability is also delivered on a SaaS basis to customers and since Observability vendors also automatically update the software running at their customer sites are Observability vendors also a potential source of the kind of problems that CrowdStrike caused?
Going in the other direction, could the Observability vendors have detected the CrowdStrike issue and prevented its widespread impact? This would have required that the Observability products were being used to monitor CrowdStrike and the servers where it was installed (Observability has traditionally been used to monitor custom built applications, not ones purchased from and delivered by software vendors).
How many agents that can automatically change things in their environment are customers going to be willing to tolerate on a forward going basis?

Conclusion

The business of developing, deploying and running application software that implements and automates critical business processes is subject to a very high pace of innovation because the ROI from this level of automation is so high. Since Observability is subject to this high pace of innovation, it too must innovate at a very fast pace. Every once in a while, the innovations will accumulate to a level which will require a fundamental reinvention and reimplementation of Observability solutions – potentially creating new winners and losers in the process. We may well be at one of those points right now.

Observability/APM is being Reinvented Again

The History of Reinventions

The Current (Ongoing) Reinvention

Impact of the CrowdStrike Debacle upon Observability

Conclusion

Categories

Recent Posts

Observability/APM is being Reinvented Again

The History of Reinventions

The Current (Ongoing) Reinvention

Impact of the CrowdStrike Debacle upon Observability

Conclusion

Related Posts

The Changing Physics and Economics of Monitoring

The War Between Amazon, Microsoft, Google, and VMware for the Enterprise Cloud Platform

Reinventing Application Performance Management Again

Observability Is About the Data

Categories

Recent Posts