Wednesday, May 28, 2008

VMware Buys B-hive

VMware announced today that they are buying B-hive, an Israeli based company that has an application mapping, application performance management, and end user experience management solution for virtualized applications systems. The B-hive product is interesting in that it is delivered as a virtual appliance that sits on a virtual mirror or spanned port on the virtual switch inside of the VMware host. This allows the product to see all of the application level flows between the layers of the applications systems, no matter how they are distributed between guests on one host, or multiple hosts.

This announcement changes the dynamics of the virtualization market in two very important and fundamental ways:

  1. It is, at least for the VMware platform, a "game over" for many of the monitoring startups that were focused upon being "the" monitoring vendor for VMware as their business plan. Unless the remaining monitoring vendors have a really strong story as to why an enterprise customer should buy their product in addition to B-Hive, the startup monitoring vendors will now be selling against, and not with the VMware Sales organization. These vendors will now have to focus very hard on whatever their value add is with respect to B-Hive, and turn their attention to the Microsoft and Citrix virtualization platforms. Of course this also means waiting for Microsoft and Citrix to get to an installed base large enough to constitute a market, which certainly has not happened yet.
  2. Taking ownership of applications performance is an extremely effective and strategic move on the part of VMware. This move is effective, because it says to customers that VMware understands that customers must be able to measure and ensure acceptable performance of virtualized applications in order to push virtualization beyond the "low hanging fruit" stage that currently exists. This move is strategic, because it says to Microsoft, "It does not matter if you make virtualization free. What matters is who can take virtualization the furthest and the fastest and thereby deliver the customer more ROI and flexibility". So, VMware is shifting the debate from "who has the best product", to "whose product can deliver the enterprise customer more ROI, more quickly". This is an extremely good move on the part of VMware because it will serve to accelerate deployments, and in turn drive more VMware license revenue.

This move on the part of VMware also significantly raises the "parity bar" on Citrix and just as they are respectively entering, or about to enter the market. VMware already has a signifant lead in the tools and products that surround the virtualization platform itself. HA, DRS, VMotion, the application mapping possible with the new release of SMARTS, and now application performance management and end user experience management with B-Hive consitute a significant difference in both manageability and functionality in VMware's favor.

Needless to way, Microsoft and Citrix are unlikely to stand idly by, and let VMware raise the bar in this manner with no responses. Since VMware has transformed the virtualization platform, into a virtualization suite, Microsoft and Citrix must to a significant degree follow suit. This means that remaining vendors that have significant pure play monitoring functionality for virtualized systems (see my Solutions Guide for the list), must now focus much more of their energies upon building a partnership with Microsoft and Citrix, and position themselves as candidates for acqusition. This also means that there is little likelihood of a long term independent market for application performance management and end user experience management for virtualized systems, as this will likely turn into war fought on the basis of who is building out their portfolio most effectively via acquisition.

Bernd Harzog
CEO
APM Experts
bernd.harzog@apmexperts.com

Monday, March 24, 2008

Managing Virtualized Systems Solutions Guide

A new Solutions Guide for the management of applications performance and end user experience is now available.

This guide consists of a White Paper that explains the challenges with using existing agent based resource monitors to manage applications performance, and that lays out the criteria by which applications performance and end user experience must be measured effectively in virtualized systems. The Guide also contains a detailed Product Comparison that analyzes nine different products that are targeting the virtualization management space, and drills into their relative strenths and weaknesses.

If you are interested in the Solution Guide, please hit the link, fill out the form and download it. It is free for you to use to help you address these issues.

Best Regards,

Bernd Harzog
CEO
APM Experts
bernd.harzog@apmexperts.com

Thursday, January 17, 2008

Symantec to Sell Application Performance Management Business to Vector Capital

They're Baack! Veritas bought Precise Software who was at one point in time one of the best APM vendors in the market. Then Symantec bought Veritas. In the course of these two acquisitions, the Precise product line lost product focus, sales focus and marketing focus, and has lagged the industry as a result.

Now Precise exists again - proving that some people believe that the APM problem is still unsolved (I agree). We will just have to see what changes in strategy will be brought about by a newly independent and focused APM entity with serious venture capital backing.

Bernd Harzog
CEO
APM Experts
bernd.harzog@apmexperts.com

Monday, October 22, 2007

The Virtualization Management Opportunity

A very smart and experienced executive in the systems management industry once remarked to me that innovations in platforms always outstrip the ability of the major management vendors to keep up with them. The world needs management startups because corporate IT adopts new platforms before they can be managed by the incumbent frameworks from the major vendors like CA, IBM/Tivoli, HP/Mercury, and BMC. If you think about the major applications architecture and platform innovations that have occurred in our industry in the last 20 years, each has required and created a new set of management vendors. Some examples are:

  • Client/Server computing – Tivoli

  • Windows Servers for production applications – NetIQ

  • Web application response time management – Keynote

  • J2EE Web Application Management – Wily (now part of CA)

  • HTTP as the standard application level protocol – Coradient, and the other web appliances

  • TCP/IP as the standard transport – NetQos, Network General, and the other TCP/IP appliances


So, what is the next one of these, and what are the implications of this particular one? The answer is that virtualization, and in particular VMware has been widely adopted in corporate enterprise IT, and with this adoption has come a big problem for IT, and an even bigger opportunity for new application management vendors. However, this opportunity is not just about a new class of applications, or a new applications architecture, it is about changes to the management of every application that has been built and deployed since the start of business computing. That is because VMware is fundamentally changing things that have not changed in a long time, if ever. As a result of these changes, VMware raises the following questions:


  1. Who budgets for and controls server capacity?

  2. Who is responsible for applications performance?

  3. How does the dynamic nature of virtualized environments change application performance management?

  4. What metrics about server performance can be trusted?

  5. How does virtualization impact root cause analysis?

  6. What approaches to applications performance management stand a chance of working (and which ones do not)?


Let’s address each one of these in order:

Who budgets for and controls server capacity?

In most enterprise IT organizations, server budgets are split. IT controls the budget and capacity for commodity servers that provide horizontal services like file, print, and email. But the business units that own business critical applications (like SAP, CRM, etc.) own the budget for the servers that run their mission critical applications. Each business unit buys servers for “their” set of mission critical applications, which creates massive silos of application specific capacity within an enterprise.

With virtualization, this dynamic changes. As opposed to silos of servers that support each application (which is an incredible waste of server resources), IT provides a shared resource pool of server capacity. Each business units applications still run in their own OS, but instead of the OS being locked to one instance of server hardware, the interface between the OS and the server hardware is virtualized, and one server can support many instances of different types of operating systems. This allows server utilization to rise dramatically, and allows IT Operations to buy server capacity incrementally across all of the supported applications.

The change in who buys server capacity also creates a problem for both IT Operations and the application owner. That problem is that there needs to be rational way to know how much server capacity to buy to virtualize the next application, and when the lack of capacity is hurting applications performance. Related to this problem is that IT Operations needs to be able to prove to the application owner that the application will perform at least as well once virtualized as it did when it was running on its own hardware. These new problems are created by virtualization because virtualization invalidates the traditional metrics used in capacity planning (question #3).

Who is responsible for applications performance?

In the physical world of one application and one OS per server, there was often a clear line between IT Operations and the applications owner as to who was responsible for what. IT Operations was responsible for supporting the platform for the application (the hardware and the Windows OS), and the application owner was responsible for ensuring the performance of their own application. Since the application owner could add capacity whenever they wanted to, the application owner felt secure that sufficient resources were available to allow their application to run effectively. When there were problems with applications performance, IT Operations had a well understood process to prove their innocence, and to dump the problem in the lap of the applications owner. The process was basically, “My metrics prove that the OS is running fine, and that your application is using no more resources than it should, so it has to be a problem with your application, not my platform”.

Once an application is virtualized, IT Operations loses this defense, since the hardware is now shared between multiple instances of the OS, running multiple different applications. When the application does not perform well, the application owner immediately points the finger at the new virtulization layer as being at fault and IT Operations does not have the tools or the metrics to defend themselves. This brings up another problem created by the virtualization of applications. That problem is that tools do not exist that allow IT Operations to prove in a defensible manner (with numbers that people believe in) that the environment (including impacts upon application A by application B running in a different VMware Guest) are not at fault. In other words, IT Operations has lost its ability to defend itself in the blamestorming meetings that inevitably occur when applications do not perform well.

A related problem is that neither IT Operations nor the applications owners have any tools that can credibly compare the performance of applications across physical and virtual implementations. The primary reason for this is that the resource based applications performance metrics used by most application performance management (APM) vendors in the physical world, do not work in the virtual world, and therefore cannot be compared across applications running in physical and virtual environments.

How does the dynamic nature of virtualized environments change application performance management?

In physical implementations (dedicated implementations of physical servers, operating systems, and applications), APM products assume a specific and fixed set of hardware provides the resources that are used by the application. These same APM products assume via configuration that the web layer of an application is talking to a specific middle tier layer, which in turn is talking to a specific database server.

When problems are reported about the performance of an application, that report contains references to the physical environment of the application, like the name of the server, its CPU rating, the total amount of memory in the server, its IP address, etc. APM products assume that these physical elements are a fixed reference point through time against which utilization of resources and performance can be judged. These products often create baselines, or statistical representations of what is “normal” based upon how much of a resource an application is using at a point in time. These products also assume that relationships between layers of an appliation system are fixed, instead of dynamic.

The dynamic nature of virtualized applications creates two more points of pain. One is that baselines for normal performance related to the specific hardware upon which a portion of an application is running at a moment in time are invalidated. The other is that products that rely upon manual configuration to understand the mappings between applications components are just too brittle to be able to deal with the rate of change in the virtual platform underneath virtualized application components.

What metrics about server performance can be trusted?

The answer to this question needs to start with a discussion of what cannot be trusted, and then to see what is left. The basic problem is that an operating system that measures the performance and resource utilization of its own processes, and applications running on that operating system assumes it is the sole user of the hardware resources on the server or workstation it is running on. For any resource that involves time (CPU %, Disk Time, Page Faults per Second, Context Switches per second), the OS assumes that it is the sole user of the system clock. So, when the OS measures how much CPU a process (an application) has used in the last N milliseconds, the OS assumes that it is the only user of the CPU.

Once you virtualize an OS, all time based metrics collected by the OS about itself and the applications running on top of the OS are shifted by the degree to which that OS is now one of many OS’s sharing the same hardware. If there are 5 guests on a server, and one application running in each guest, and all are doing equal work at that moment in time, then the metrics reported by a guest OS will be off by the fact that each guest only sees one-fifth of the hardware resources at that moment in time. Of course if in the next second, a guest is shut down, then the degree of time shift changes.

The first conclusion about metrics in virtualized environments is that any metric about resources that is based upon the use of a resource over time which includes all of the ones listed directly above, is invalidated (made irrelevant and untrustworthy) by virtualization. The only resource based metrics that remain valid are ones like how many bytes of memory an individual process is using, and how big a file or database is in actual bytes.

However, the problem gets worse from here. The holy grail of applications performance metrics, response time, is also impacted by this time-shifting. If an APM product reports that transaction A as measured by the elapsed clock time from action B to response C is .5 seconds in a physical implementation, then it really took .5 seconds. If that same measurement occurs within a VMware Guest that is one of 10 guests on a server, and all guests are equally busy, then the response time monitor could well again report .5 seconds, but the actual clock time that elapsed could well have been 5 whole seconds. This is because the Guest OS only knows about the clock ticks that it gets, and as opposed to getting all of them in a physical environment, that guest is only getting a variable share of those clicks at each moment in time. So, virtualization can also invalidate the most valuable and credible of APM metrics (response time) if those metrics are collected from within a virtualized guest OS.

How does Virtualization impact root cause analysis?

Virtualization makes traditional root cause analysis much more difficult for all of the reasons mentioned above. By invalidating many of the metrics and their baselines that server and application support teams rely upon, virtualization makes it much more difficult to use those metrics to pinpoint abnormal behavior.

Virtualization also creates a whole new root cause problem. That problem is to answer the question, "Why does this application perform poorly when virtualized, but performs just fine when it is using completely normal amounts of CPU and memory on a physical server". The inability to know how well a particular application will perform once virtualized means that the only method that is feasible is to "try it and see how well it works (or does not)". Since IT does not inspire confidence with many business units and applications teams, having IT have to use the "trust me it will work" promise is a major roadblock to virtualizing the 80% of the applications that really matter in a corporate enterprise.

Which approaches to applications performance management stand a chance of working (and which ones do not)?

Before virtulization, IT Operations had resource based metrics to fall back upon when questions of applications performance arose. Now these metrics are either unavailable, or not credible. The shared and dynamic nature of virtualized environments makes APM approaches based upon how much CPU, or Disk I/O an application is generating at a moment in time untrustworthy and invalid. Furthermore, approaches that gather response time data via scripted synthetic agents, or real time passive agents are also impacted when those metrics are collected inside of guest OS.

So, what works today, and what new approaches are needed? There are two approaches that stand a chance of working. One is to rationalize the resource based metrics, by allowing metrics collected by the host OS, to be combined at any moment in time with metrics inside of the Guest to provide a true picture of resources utilization. This requires VMware to either collect and publish the host metrics in a usable form, or requires third party agents running in the host OS (something VMware is reluctant to get behind since it wants to keep the host OS as thin and efficient as possible). The second approach is to rely upon actual transaction response time metrics collected at actual end user workstations, from within the actual applications or upon response times collected by network appliances connected to mirror ports on switches. This has been the holy grail of APM for several years now, and this approach has now been made all the more valuable by the demise of the traditional resource based metrics. Whichever approach is used, it will have to be combined with an ability to dynamically understand changes to the enviroment of a guest OS, and to the components of an applications system at a point in time. So, changes in the environment of an application needs to be constantly auto-discovered by APM tools in order for the tools to be able to provide relevant information.

The race has started to allow IT Operations and applications owners to know how well their virtualized applications are really performing, and to be able to quantify that performance in ways that allow for more applications to be virtualized in denser implementations then are now possible (since no one knows how densely you can pack guests into a host without causing problems). Successful vendors of response time based APM solutions are effective today when they sell to applications owners. Virtualization creates many additional pain points for applications owners and IT Operations. The successful vendors will figure out how to make their products dynamic and based upon credible metrics and to tune their sales and marketing approaches to address these new points of pain and target audiences.


Bernd Harzog
CEO
Application Performance Management Experts
770-475-4249
bernd.harzog@apmexperts.com
http://www.apmexperts.com/


Friday, June 01, 2007

Product Review — Wily Introscope for Microsoft .NET @ SOA WORLD MAGAZINE

Wednesday, October 25, 2006

CA Releases Wily Introscope for Microsoft .NET

This whole web site is about the fact that despite the billions spent on Systems Management and Applications Performance Management, knowing that a user is having a problem with an application, and then knowing how to fix remains a largely unsolved problem. For the last few years, Wily Technology (now part of CA) has lead the market for addressing this problem with high-end production J2EE based applications.

Wily has now announced that they have extended support in their market leading Introscope product to Microsoft .Net based applications. The detail is that Wily ported the agent that runs inside of the J2EE applications server to C#, and did the remaining steps necessary to allow that agent to manage .Net applications as well as the Java agent manages J2EE applications. The .Net agent also leverages the rest of the Introscope product line. This will prove to be a boon for customers who have both J2EE based applications and .Net applications (or applications that are comprised of some of each environment), as it will allow for a common way of measuring performance, tracing transactions, and performing root cause analysis.

What this new release does not do (and does not claim to do) is solve the APM problem for the general case of Windows applications no matter how they are developed. This remains a really hard (and probably impossible) problem to solve since there have been and continue to be so many different ways to build and deploy Windows applications.

Bernd Harzog
CEO
APM Experts
bernd.harzog@apmexperts.com

Saturday, June 10, 2006

RIP Cesura

Cesura, one of the more promising startups in the applications and performance monitoirng space shut down the week of June 5th, 2006. Cesura had a lot going for it including:

  1. It was lead by Bob Fabbio, who was the founder of Tivoli, and who was highly skilled at raising enough venture capital enable a startup to achieve some real traction
  2. It was able to raise $15m in a restart after a previous positioning failed in order to target the applications performance problem with web and Citrix hosted applications
  3. It sported an impressive set of functionality covering some true measurement of end user experience, instrumentation of a broad set of infrastructure metrics and a highly differentiated set of analytics to pinpoint the areas in the infrastructure responsible for the end user or applications issue.

However, despite lots of money and significantly differentiated technology, Cesura was done in by two big mistakes. The first mistake was that the company did not maintain a focus upon a target market long enough to take market feedback and build something that truely met a market need. Cesura relaunched with a focus upon Citrix in the fall of 2005. By the middle of Q1/2006 the focus was upon health care applications and partners. From April of this year on, the focus shifted to value added hosting vendors and MSP's. Three target markets in three calendar quarters is enough thrashing to confuse customers, prospects, the press, analysts and even upper and middle management. A CEO with a pedigree, and $15m in the bank was apparently not enough to overcome the confusion caused by a strategy du jour operating plan.

The second big mistake was that things that the company throught were features of the product, were in fact viewed by prospects as impediments to adoption. For example:

  1. The analysis engine and database were housed in an applicance, that had to be LAN connected to each switch that was a part of each unique subnet that hosted portions of the software that comprised the target applicaitons system. The feature here was the idea that the mangement product had its own network, and would not die if the production network died. The reality is that caused the sales team to have to get the network folks to approve the installation of something that was supposed to solve and applications and business level problem.
  2. The product consisted of one and only one appliance. Each appliance could only support around 100 servers, and there were no cross appliance analytics or reporting. So, a product with a high end message and value proposition, could not scale up to high end enterprise applications systems.
  3. The Citrix offering was unique in its ablity to measure true end user experience at the ICA client, but this only worked for a subset of the applications and transactions published through Citrix, and only worked on a subset of the Citrix client environments. Again, something that was truely unique in certain cases, did not support the breadth of environments to make cusotmers comfortable that this was an enterprise solution.

In short, the rules for startups targeting the user or applications performance space remain the same. Make sure that what you have to sell adds value to the monitoring products that customers have already invested in, and focus like a laser upon a set of prospects (otherwise known as a market segment), that have a common problem, and that can be accessed as a community or a market. It is amazing that after all of the money that has been invested in monitoirng startups, that companies still get funded and fail for these simple reasons.