What do I do for a Living? – Step 2: Performance Monitoring
This is a follow up to previous blog post titled, “What I do for a Living.” That was a question posed to me recently by my Mother. I work for Monolith Software. We are a developer and marketer of technology management & monitoring software or more generically put network & systems management (NSM) software. Over the past 13 years I witnessed the following NSM pecking order in terms of monitoring maturity:
1. Availability Monitoring (discussed in my last blog post)
2. Performance Monitoring
3. Fault/Event Management
4. Correlation & Enrichment
5. Discovery & Mapping
6. Real Time IT Dashboarding
7. Service Level Management
This blog post will discuss Step 2 of my NSM pecking order – performance monitoring. Hopefully I have adequately established that availability monitoring is the most obvious first place to start your efforts in the monitoring & management arena. You will look pretty silly as an IT professional if you have invested money in monitoring software and are still surprised by outages that occur. Just yesterday I was talking to a friend of mine at Cisco. One of his customers is rolling out tens of thousands of IP phones to their locations. They recently spent a significant sum of money on management software from one of the large vendors. The big issue with that is the software did not detect and notify them of an outage. They found out about it when they became inundated with calls from the remote locations employees complaining that they couldn’t make phone calls. Oops.
If availability monitoring is that binary on/off switch that will notify you when a node goes down, then performance monitoring is the proactive monitoring that will/should alert you before the hard outage occurs or before users start to complain. Do hard outages occur that performance monitoring systems cannot proactively detect? Sure. However, many times issues become incrementally worse. If you are adequately monitoring the performance characteristics of your environment, then more than likely you will be able to identify many issues before they become hard outages.
Consider what happens to a server when it runs out of disk space. Everything stops working – right? Simple performance monitoring of disk space on the server could have notified you when you are getting short on disk space which would have given you the time to address this prior to a hard outage occurring. I know, you are reading this and thinking to yourself, “That is ridiculous. Who wouldn’t be monitoring host resources on their servers.” You would be surprised. I just ran into a prospect very recently. They called their software vendor to complain that the application wasn’t working. The software vendors support organization worked with the customer to trouble shoot the issue. Sure enough, the server ran out of disk space. Of course the application isn’t working!
Network monitoring traditionally monitors things like device CPU, memory, temperature, interface traffic, errors and discards. Due to the prevalence of IP Telephony (humorously pronounced tel-le-fon-eee by newbie’s) it is imperative to proactively perform performance monitoring on key network characteristics that can negatively impact the quality and performance of IP Telephony. Key performance characteristics to measure can often be accomplished by utilizing Cisco’s IP SLA technology (used to be called SAA). Key measurements are RTT, packet loss, jitter, MOS scores, and R-Factor voice scoring. The below image is an example Monolith real-time dashboard showing these key characteristics.

Next blog post will focus on fault/event management.
Cheers!
Technorati Tags:
availability monitoring, performance monitoring, fault management, event management, correlation, enrichment, discovery, mapping, monitoring, application monitoring, IT monitoring, IT Ops, DNS, DHCP, HTTP, HTTPS, up down monitoring, IT operations, synthetic transaction, passive application response, performance monitoring, NSM software, monitoring software
Trackbacks & Pingbacks