Observability is a very attractive market for operators and investors (Datanami estimates a $17B market š°) and it's expanding at an extremely fast rate.
As organisations of all sizes shift from in-house hosting of monolithic applications to more modular architectures on some sort of cloud, they become more and more reliant on complex infrastructures to serve their āalways onā applications.
These applications and the infrastructures hosting them run, in a countless variety of configurations, on a very delicate balance of parts. To serve these apps properly, very complex systems need to work like a clock. The slightest hiccup of a container, a function or even a silly S3 bucket misfiring and suddenly latency becomes sub-parā¦ an alarm goes off, someone is not going to be happy š”.
Monitoring and his hotter brother Observability allow trained specialists to maintain these systems stable and under control. Thanks to the combined power of logs, events and metrics, SREs, DevOps and Troubleshooters can investigate, resolve and prevent issues. The lucky ones even have powerful AI tirelessly working to ensure proper provisioning for the sake of the user experience. This is all well and good, we are lucky to have found a way to tackle such a complex issue but, in my opinion, the solutions offered by observability providers are actually becoming part of the problem.
Surely you could reduce the retention period, aggregate data, lower the resolution or use a more efficient storage, but these solutions are not always an option and to me they sound more like a compromise.
Anyway it is not only a cost problem, and yes, observability is expensive, data is growing like crazy and you need it in real time, so you got to pay for the tools. What really bothers me is the way these solutions are structured and presented.
Everyone is positioning themselves as the āundisputed leaderā of the space (ah, those analyst reports must cost a fortune) and everyone offers the ultimate end-to-end solution, everyone is full-stack, everyone has that special angle - how boring... š„±
The reality is that Observability is practically a commodity at this point, do you know how many providers offer to store and query logs and metrics?
I donāt knowā¦ there's too many to count! š§®
In the 2010s I worked with the author of the syslog-ng (hell of a š§ BalĆ”sz) and I thought I had seen it all in relation to logs - I was wrong.
The point is, even if they provide a valuable service, observability suppliers are promoting complexity to differentiate themselves not only to bring value. For sure they help us all chasing incidents and fixing misconfigurations, but they also tend to lock-in their customers by āunifyingā all data in shiny (black-box) data stores, by imposing proprietary formats and by making it hard, if not impossible, to āportā data and schemas somewhere else.
I don't even want to start talking about those ālightā agents and their proprietary ingestion pipelines š„“. Do you know what happens to your data once ingested? How many times is it replicated? Where does it end up? If and how is it protected? What if you want (or legally need) to remove a single log from the index?_
You better donāt ask these questions, unless you want to spend the next 5 hours retrieving a single log fileā¦ using command line (sorry no UI for this) from the datastoreās index of your $100K/year minimum, full-stack leader šļø
Instead of adding layers of useful complexity (AI, automation, contextualisation etc) why donāt we try to produce solutions that give customers options and control in terms of what resources to use and how?
A former boss of mine used to tell customers at the end of the project delivery meeting āobservability isnāt really complicated until you have achieved itā. Basically to control complexity you need complex controls.
Well, I disagree, I see a lot of ways we can work to dial down the complexity and the costs of observability while retaining the benefits. For one you can adopt simpler solutions, they exist. Nope, you wonāt find them in fancy quadrantsā¦ but it doesnāt mean they are garbage.
Simpler does not mean dumb, when simple works we call that sophistication š.
Not every solution requires you to install new, dedicated agents, maybe you can use the ones already in place. Less time spent installing and maintaining an element that although necessary is a potential data-loss/leakage liability.
Simpler data capture? Good.
There are solutions that use data stores that are actually open and accessible, like databasesā¦ actually our datastore is a DB š, we use Clickhouse and itās awesome!
Now I can query AND actually interact with the datastore? Great.
K.I.S.S.
Simpler for us means open, flexible, efficient and affordable and we believe it to be the right approach, we also believe in performance and you would be surprised how robust and performant simple can be.
Let me know what you think!