Monday, April 21, 2008

Monitoring that matters

I am using Hyperic HQ monitoring for more than a year now. And it becomes more and more obvious for me that an important aspect of monitoring is missed by popular solutions. Yes, it is important to know throughput of the system.

But the metric that is really matters is CUSTOMERS EXPERIENCE. And it usually derives for two things:
  • Service responsiveness
  • Error ratio
The problem is that these metrics are application level parameter. I mean there is no OS counter that can be easily obtained by software system. And both application and monitoring system developers should pay effort on integration. This is where HQ is good. It is really easy to create JMX bean and an XML plugin to gather application specific metrics.

But there are area for improvement. I'd like to have:
  • Network level error statistics like number of missed IP packets
  • Exceptions in the log by type
Usually, errors rate gives more information about system health. But this type of metrics are successfully ignored at the moment.

1 comment:

Unknown said...

Hi,

I wanted to use hyperic and did not know how to collect application metrics through it. I posted a qtn in hyperic forums.http://forums.hyperic.com/jiveforums/thread.jspa?threadID=6470&tstart=0

Please let me know how to achieve this. An example,if possible, will help