Conclusion: KPIs and Application Monitoring
Best Practices KPIs for Application Management
The third and final post in this series focuses on Application Management. The most important IT resources of any business are its key applications. Without them, you may not have a billing system to pay employees, or sales & ordering systems to track new opportunities and deliver goods. Some businesses are ASPs (Application Service Providers) — their whole business is applications. I usually group Application KPIs (Key Performance Indicators) into three key areas: environmental (the resources the application relies upon), synthetic transactions (individual tests proving access or performance criteria), and application feeds (specific application data vital for troubleshooting). This discipline varies the most from business to business, but all KPIs fall under the same generic umbrellas as described below.
Service Availability
- Source: Service Level Management (SLM) or its cousin Business Service Management (BSM) is generated by the monitoring system itself
- Method: A two-prong approach is recommended — Event-based for real-time impact analysis and Metric-based for historical reporting, trending, analysis, and compliance
- Use: Knowing that a service the application is providing is either down or degraded is the first and most basic step of application management, and usually the complexity of the application is such that determining whether it is available or not may be a challenging feat. SLM/BSM provides a great way of simplifying this task, by allowing service correlation with most individual service checks done at the systems level. Businesses can combine the environmental, service checks, and application feed data into a single simple application metric
Environment Availability
- Source: Network/System Management Systems
- Method: Most customers either integrate 3rd party data or replace/duplicate monitoring of the resources the application needs to run
- Use: Knowing that a problem is not the network or systems allows businesses to focus resources on fixing the right problem, and avoids the blame game
Application Availability
- Source: Agent or Agent-less options are available. I usually prefer agent-less as it’s easier to maintain, but it all depends upon what is available
- Method: Most customers run custom synthetic transaction scripts to test individual key functions within the application
- Use: Applications are usually very complex in the list of discrete inputs, outputs, and internal workings. It’s absolutely vital to test every key working part in a set interval so that they can be combined to determine application availability
The secondary KPIs are all dependent upon the complexity of your applications; for highly complex/customized application they may be completely necessary.
User Experience Performance
- Source: Agent or Agent-less options are available. I usually prefer agent-less, as it’s easier to maintain
- Method: Various methods are available from synthetic transactions to passive application monitoring via network sniffing, or counters stored and reported by your custom application
- Use: Tracking your end-user experience is key to keeping that application’s user base happy and productive
Scale or Capacity Management
- Source: Agent or Agent-less options are available. I usually prefer agent-less, as it’s easier to maintain
- Method: Various methods are available; it all depends on where the data exists
- Use: Tracking the number of users active or the resources being consumed by your user community allows for better capacity management, so that the application can grow as fast as your business needs it to.
Troubleshooting Performance Metrics
- Source: Agent or Agent-less options are available. I usually prefer agent-less as its easier to maintain
- Method: Various methods are available all depending upon where the data exists, usually troubleshooting metrics come directly from a custom application feed
- Use: There are some metrics that your developers want to track to help in troubleshooting or testing of the application. These are vital to your keeping your MTTR rates low and increasing the stability of the application in general.
RMON/Netflow Statistics
- Source: Usually Agent (Probe) based, however some RMON/Flow statistics are available via SNMP on some devices
- Method: Majority of people use Cisco’s Netflow, but J-Flow from Juniper, C-Flow for Alcatel, and other standards-based options are available like RMON, IPFIX, sFlow, etc
- Use: These stats describe packet/bandwidth counts by port/protocol on your network and via VMWare or open source host agent. They allow you to see inside the standard bandwidth stats and see who is using what, where. Traditionally they have a lot of overhead which outweigh their usefulness, but the data may replace the need of using sniffers to troubleshoot problems, which can reduce your MTTRs
Once again, your KPIs must be created and decided internally, but hopefully this list can be helpful when determining them. Here at Monolith, we know these KPIs well because we deliver the capability to monitor them for our customer base. Here is a link to our application monitoring datasheet where we list all of the functionality required to collect, display, and report these KPIs.
<– Back to Part Two: KPIs for Systems Management
Technorati Tags:
Application Monitoring, Application Availability, Application Performance, Application Metrics, BSM, Business Service Management, SLM, Service Level Managment, Infrastructure Monitoring, Monolith Software

Trackbacks & Pingbacks