How observability has changed in recent years, and what’s coming next

Donations Make us online

Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

In recent years, businesses have become increasingly reliant on observability to manage and maintain complex systems and infrastructure. As systems become even more complex, observability must evolve to keep pace with changing demands. The big question for 2023: What’s next for observability?

The proliferation of microservices and distributed systems has made it more difficult to understand real-time system behavior, which is critical for troubleshooting problems. Recently, more businesses have solved this problem with automations to monitor distributed architecture, deep dive tracking and real-time observability.

However, each decade has brought a sea change in how observability is expected to function. The last three decades have seen transformation after transformation — from on-premise to cloud to cloud-native. With each generation has come new problems to solve, opening the door for new companies to form:

On-premise cloud era led to a few companies like Solarwinds, BMC and CA Technology.
The cloud era (where AWS came in) led to a shaking market, with new companies like Datadog, New Relic, Sumologic, Dynatrace, Appdynamic and more.
The cloud-native era (starting in 2019-20) has resulted in another market shakeup.

Why is observability changing?

The main reason for the current shakeup is that businesses are building software using entirely different technology compared to 2010. Rather than monolithic architectures, they use microservices, Kubernetes and distributed architecture.

Event

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.

There are three key reasons why this is the case:

Better security
Easy scalability
More efficiency for distributed teams

However, there are challenges as well. According to data from Gartner, 95% of systems will be cloud native by 2025. Since cloud native generates much more data than previous generations of technology, hosting and scaling those data becomes more challenging. This presents three major problems.

1. Prohibitive costs

The first problem is relatively straightforward: Cost. All legacy observability companies have become so expensive that most startups and medium businesses can’t afford them. As a result, they’re using old technology to host and process their data — technology that can’t respond to needs in 2023.

2. Evolving priorities in observability

Additionally, as the capabilities of observability have become more advanced, the KPIs and OKRs that development and operations teams track have evolved.

Before, the primary focus was on ensuring applications and infrastructure didn’t crash. Now, dev and ops teams are operating at a deeper level, prioritizing:

Request latency
Saturation
Scalability
Traffic maps for where usage is happening
Optimizing and predicting future outcomes
How new code changes cloud usage

In a sentence, dev and ops teams have become more proactive than reactive. This requires technology that can keep up.

3. Changing expectations for observability

Finally, the rise of microservices architecture changes how IT teams observe application changes. One microservice can run across a hundred machines, and a hundred small services can run in one machine. There’s no “one-size-fits-all” approach. Dev and ops teams need deeper analysis to understand what is happening across their infrastructure.

These are the challenges. So how should the new generation of observability tools respond in 2023? From my perspective, here are eight things we will need to win the market.

Note: I’m looking at a 30,000-foot view of a vast market. It’s unlikely that a single company will do all these things. But these are the needs, and it’s going to require new companies, technologies and platforms to meet them all.

Unified observability

All the legacy companies say they’re an unified observability platform. What this really means is that they have different tabs for metrics, logs and traces accessible from their platform.

This doesn’t actually solve the problem. What dev and ops teams need is one place from which to view all this data in a single timeline. Only then will they be able to trace correlations and determine root causes to issues — and solve them quickly.

Integrated observability and business data

As Bogomil from Sequoia mentioned in this blog, most businesses don’t correlate their observability and business data. This is a problem because there are powerful insights to be gained from analyzing the two side by side.

For example, Amazon recently found that if their website slows by one extra second, they lose millions of dollars daily. This can be huge for eCommerce businesses, especially if they track a slowdown in orders — it could be due to poor application performance. The faster they fix the application, the more orders they receive, and the more revenue they earn.

The same goes for software companies. If the application is fast, this improves its usability, which improves user experience, which in turn impacts a number of business metrics. Only by integrating these two sets of data can businesses start to make these connections to improve the bottom line.

Vendor-agnostic Open Telemetry (OTel)

Companies are looking for a solution that doesn’t lock in one vendor. That’s how most tech companies are contributing to open telemetry (OTel) and making it the go-to tool for data collector agents. OTel has many benefits: interoperability, flexibility, and improved performance monitoring.

Predictive observability

In the AI era, everything is moving to become a human-less experience. This can enable systems to do the things that humans simply cannot, like predicting errors before they even happen via machine learning.

This is not common in observability right now, and there is a major need for more innovation. By adding an AI layer to observability platforms, businesses can predict issues before they happen, and solve them before the user or customer even knows that something is wrong.

Predictive security in observability

Observability and security work very closely. Most observability companies are moving to security because they control all the data collected from applications and infrastructure.

By reading metrics, logs and traces, specifically those that demonstrate unusual behavior, AI should be able to understand security threats. Most SEIM and XDR don’t do this. And even if they do, it’s a rule-based model rather than analyzing and learning from behaviors.

Cost optimization

Perhaps the biggest challenge in observability is cost. Although cloud storage is getting cheaper and cheaper, most observability companies aren’t lowering their prices to match. Customers get the short end of the stick, mainly because there are no alternatives.

Open Telemetry collects over 200 points every second. However, we don’t need all these data points. So rather than charge users for storage they don’t need, organizations should collect and store only the useful ones and delete the rest. This can reduce the cost of storing and processing data.

Correlation to causation analysis

Most legacy observability platforms give basic information about what’s happening in the cloud or application. However, many times the inciting event takes place hours or even days before. As such, it’s important to monitor CI/CD pipelines to see when code gets pushed, as well as which regulation or request starts to create the problem.

Let’s say there’s one network socket that’s slow, and it starts to clog requests. As a result, your backend starts to slow, which then produces an error. Then the front end slows, producing another error. Then the application crashes. You may only notice the front end slowing down and think that caused the application crash. But in reality, the problem started elsewhere.

In a distributed architecture, this root cause analysis takes more time than in a monolith. Observability platforms need to adapt to this new reality.

AI-based alerts

Alert fatigue is a real challenge. When developers receive so many alerts that they mute email threads or Slack channels, this hides issues and slows down time to resolution.

Instead, AI-based alert systems leverage AI to predict which alerts are essential and which are not. AI can also provide context and even suggest possible solutions.

Final thoughts

This is an exciting time to be in observability. The changes we’re seeing are opening the door to untold opportunities. The question remains: Who will rise to the top in 2023?

Laduram Vishnoi is founder and CEO at Middleware.

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!