Service Assurance – Looking Back to See the Way Forward
Now that I’ve been in the technology arena for over 20 years, I feel as though I have a bit of perspective on the industry. Spending this much time on the front lines gives me the unique ability to pinpoint past trends that are useful in charting the best path to the future.
It Wasn’t That Long Ago…
I started my professional career working for Motorola in 1990. On my desk there was no PC. I had a 3270 terminal whose primary purpose for me was to access and utilize the MRP (Manufacturing Resource Planning) software. I also had a terminal on my desk that allowed us to perform word processing and spreadsheets. We didn’t run applications locally. They were run off of a Unix server maintained by IT. No one had email addresses or Internet access. That came to Motorola in 1994 if I recall correctly.
Why am I telling you this? More than anything, to paint a little perspective. Most people take PCs, the Internet and email for granted. We forget how rapidly things have evolved. Here’s one more thought to consider. Back in 1990 most people did not have cell phones. Those that did carried around what were called bag phones. They were about the size of a woman’s purse. That was mobile voice communications back in 1990.
Connections: Seeds of Need
It was probably the 1995 to 2000 time frame when companies started rolling out PCs to employees and building LANs (Local Area Networks). AppleTalk, LANtastic, Banyan Vines, Netware were some of the prevalent choices back then. I really can’t recall selling monitoring or management software back in that time frame. We were all more focused on trying to figure out how to make this stuff work so our users could take advantage of the productivity-enhancing capabilities of PCs.
Once we got through ‘The Year 2000’ crisis (or as it turns out, lack of a crisis), this Internet thing really started to take off in a big way. Our networks were all interconnected. We were interconnecting our remote offices to headquarters. I was at Cisco at this time – a fun time to be there. We not only had our computer systems able to talk to our other company offices, but we were also connected to the Internet. This allowed us to communicate with other companies via email and attachments.
This is where the early drivers for monitoring and management began to take root.
Early Monitoring and Management: Reactive and Event-Focused
It was during this time frame that we really started selling monitoring and management software. Prior to 2000, IT systems were very much monitored by all of the vendor specific EMS’s (Element Management Systems). Micromuse came along with their MOM (Manager of Managers) approach and sold a boatload of software offering the promise of a Single Pane of Glass to NOCs (Network Operations Centers). Prior to 2000 this was predominantly sold to large telcos and service providers.
The MOM concept was good. Instead of having NOC operators looking from console to console to try to determine what is wrong with their networks, they can look at a ‘single pane of glass’ and thereby minimize the swivel chair effect. Make sense? Absolutely, it’s more efficient. Organizations had a better chance of identifying coincidences (early correlation) of disparate events if they are all in one screen. Life was good. A lot of Micromuse Netcool was sold. But still…organizations were highly reactive and event-focused (waiting for traps or syslogs to come into their event console).
Event + Performance Management
In parallel to this event-based MOM movement occurring, there were other transformations taking place in the monitoring and management space. Organizations understood that it not only made sense to react to specific problems when they occur (interface down, device down, card failure, loss of signal, etc.), they also realized that it might make sense to monitor the performance utilization of these devices. Along came the performance management niche. In it you saw companies like Concord, InfoVista, Quallaby, INS, Trinagy, etc. These companies were essentially SNMP polling software companies. The software would run on a centralized server(s) and send SNMP requests to network devices asking them questions such as, what is the current utilization on this interface, what is your temperature, what is your CPU utilization, what is your memory utilization, etc. Why? Pretty straightforward. If I can identify devices or interfaces that are pegged out, then quite naturally I can take corresponding action –buy a more powerful router, procure more bandwidth for my WAN circuit, add more memory to the device. (Are we ready to talk about Service Assurance yet? Almost. There’s one other area I’d like to cover first…)
Availability Management Provides Needed Balance
I have covered event management. I have touched on performance management. Let me address availability management (and then we can start to talk about Service Assurance). Think about a three legged bar stool for a moment. One leg is event management. The second leg is performance management. The third leg is availability management. It’s kind of hard to stay balanced on a two-legged stool. That third leg comes in quite handy. (Yes, I know. What about Config, security, change, inventory, topology? All good and important, but I’d argue they are ancillary to the big three that I have just mentioned when the focus of the discussion is network monitoring and management.)
Devices want to tell us when they are having problems. They speak languages like SNMP traps and Syslogs to communicate these problems. IT & Operations personnel like to try to be a bit more proactive; thereby installing performance management systems to try to identify utilization issues. But I want to know right away when something is down and not working. A device cannot send a trap or a Syslog if it is down. A performance monitoring package (in the traditional context) cannot let me know immediately when there is a problem (is the data really missing, was a packet dropped). That is why there has been a market for availability monitoring software. When I think of representative early packages in this space I think of WhatsUp, Big Brother, HP NNM. I’m really talking about software that would ping a device to see if it would respond to the request. If it responded, then it must be up. If not, it must be down. Useful information to know. The idea went from understanding device availability to also understanding interface availability (devices can have multiple interfaces right?). Why not take it a step further? What about letting me know if a server process is up, running and responding (think about a Unix process or a Windows service). A nifty company called SiteScope came along. They built software that would tell you if a service running on a server was up or not. Think DNS, DHCP, HTTP, HTTPS, FTP, SMTP, IMAP, etc. This took us a step closer to understanding what purpose a server actually serves versus just doing typical device monitoring. The idea with network monitoring up to this point was to monitor all of your network devices and servers. The software didn’t really understand that certain servers or network devices were more critical than others.
Business Assurance: Aligning With Business Needs
Back in April of 2002 I first heard the term Business Assurance applied to the monitoring and management space. I worked for a company that was a big Micromuse partner at the time. Bill Cannon (with Micromuse at the time, he’s now with us at Monolith) and his crew built a sales pitch focused on helping their sales team move out of the ranks of only calling on the IT/Operations Directors. They wanted to figure out how they could align their software to the mission critical needs of the business. The term Business Assurance was utilized. The idea was to understand that all of these routers, switches, and servers don’t exist for their own merit. They exist to allow employees to utilize applications that run over networks. Somewhere a light bulb went off that said, “What if we can start to map devices to services (or applications). We could then start selling our software to business owners that need to make sure mission critical applications are working the way they are supposed to. This approach can likely lead us to getting funding in a more rapid fashion than our traditional approach of selling to IT or Ops where we are viewed as a cost center.”
And so it was done. If you view how Micromuse attacked this problem, then you can understand completely what they did, why they did it and how they did it. They didn’t own a performance monitoring company at the time. They did have their Internet Service Monitors (ISMs) that purported to do what a product like SiteScope did. Based upon the market and the inventory of tools that they had available to them they took the obvious approach. Develop integrations to the performance monitoring packages on the market to have them forward threshold violations into the Netcool Event system. Have the ISMs forward similar events in. Effectively turning everything into an event. That way Netcool Omnibus can speak its language. Once that is done we can then put a new capability on Omnibus to allow organizations to create service buckets.
This was initially done in dashboards utilizing WebTop. Since you can create your own event based dashboard. And you can turn everything into the lowest common denominator (events), then you can create a dashboard showing only the events tied to a specific sub-set of the overall devices in the infrastructure. Voila, Business Assurance. All was good and a lot of this was sold. The next instantiation of this was “how can I enhance this further?”, what if I could build a service tree whereby the branches of the tree represented specific technology assets required to allow the service to run. An example might be the WAN, the LAN, the servers running the application software. A simple, three branch tree. Now in order for the service to be considered to be running, all three branches have to be running. I could then go ahead and have an overall status indicator for the service that would show a green status of the three branches.
Enter Service Assurance
By George, I think we just moved from the concept of Business Assurance to Service Assurance. Albeit, an event-denominated service, but a service nonetheless. BTW, for those of you who go back far enough, this was called SLAM or Service Level Agreement Manager. I sold what I had believed to be the first SLAM sale in the Midwest to a large healthcare Integrated Delivery Network provider in Chicago in 2003.
Looking Forward
I’ve thrown a lot of history at you here, but the perspectives are valuable. Understanding the migration from individual device management to business assurance to service assurance provides insights that foretell the future course. In the next blog post in this series I’ll bring you forward to today (e.g. 2012) to talk about the next evolutionary leap in service assurance. In other words, how the industry (enabled by Monolith Software) is moving from event-based service assurance to proactive, KPI-driven, metric-based service assurance.
