Server virtualization project risks
It is well known that no project is risk-free. Things can go bad and unfortunately they often go bad. Identification and analysis of project risks is a topic with an extensive literature. There are risks that are common to all projects (generic risks), and others that are due to the specific features of the project (specific risks). So for instance, since every project has an end date, every project has the generic risk of not being completed in time. In this article we shall focus on those risks that are specific of server virtualization projects and to the specific features of generic risks in server virtualization projects.
Performance risks in server sever virtualization projects
In a new application implementation project it is very difficult to size the systems because no workload data is available. On the contrary in a server virtualization project companies have extensive workload data. Unfortunately, not always there is the will to collect and analyze them.
There are basically three strategies to mitigate the risk of undersizing systems and therefore of having an excessive response latency:
Oversizing;
Extensive experimentation; and
Data collection and analysis.
Oversizing is a very common strategy. The basic rationale is that HW is so cheap that it has little sense to spend time to identify the exact requirements. However, it is important to remember that unless you make experimentations or an in-depth assessment, you do not know whether you are actually oversizing or undersizing the systems. You even do not know whether you are virtualizing the right applications. You can adopt an aggressive approach, and then as a consequence have complaints from users about system performance; or you can adopt a cautious approach, and then have a virtual server farm scope much smaller of what could have been. Extensive experimentation is a good but costly alternative. Typically systems are sized according to rule-of-thumbs and generic policies (e.g., DBMSs should not be virtualized) and only those that are supposed to have significant overheads are actually tested. Unfortunately rule-of-thumbs are often unreliable and generic policies gloss over the specific features of virtual servers. Data collection and analysis is the ideal approach. There are however several important challenges:
Simultaneous data collection from tens or hundreds or servers.
Cleaning and analysis of workload data containing tens of thousands of data points.
Identification of the optimal virtual server farm out of the collected data.
Estimate of the virtualization layer impact on the workloads.
Each of these challenges can be efficiently handled with appropriate data collection tools. The WASFO Data Collector and WASFO Analysis and Optimization tools (see references below) have been developed and designed exactly with this purpose.
High Availability risks in server virtualization projects
In a non virtual server farms there are very few applications that are classified mission critical and are protected with High Availability (HA) clusters. HA clusters can significantly improve the service availability insofar as HW and application failures are concerned. Unfortunately they are expensive and complex to maintain. HA clusters protect against server, OS and application failure but they require:
Shared storage (not all HA technologies require shared storage but those most widely distributed do);
HA software;
Scripts or DLLs that identify failed applications and shut them down in a regular way; and
Overall certification of the solution to get support from all the interested vendors.
Hypervisors (also known as Virtual Machine Monitors or virtualization layer) thanks to the fact that Virtual Machines images are actually files hosted on a shared storage make it possible to create server farms in which all application instances are protected from server failure. If any server fails, a monitoring service will detect the failure and turn on the VM on another server. Unfortunately these technologies monitor and act at the hypervisor level so they do not deliver any protection in case of application failure or freeze. If such a protection is required HA cluster SW can be used on top of the virtualization layer.
Another important point is that hypervisors, thanks to the fact that Virtual Machines can be moved at runtime with no service interruption (live migration), make minimal the impact of planned server outages. If, for instance, a server needs to be rebooted to change a failed component, the server can be first moved to another server so that the user activity is not interrupted.
Security risks in server virtualization projects
If you search Google for "virtualization risk" you will find tens of articles on security risks. That proves that security is the most important concern people have as far as virtualization project risks are concerned. People are usually concerned about what they do not know well because one of the fundamental determinants of human behaviour is the need to have some form of control of the surrounding environment. Virtualization is not an exception. In these projects a whole new set of products is introduced; and those that already are up and running need to be configured in new ways. So a cautious approach is not only recommended by mandatory.
Since there are so many articles on virtualization and security around in the web we shall not spend time here to go through all security concerns. We shall limit ourselves to point out that unless strong control processes are in places in a virtual server farm it is far easier to create new OS systems. So it is not surprising that people after a while discover plenty of Virtual Machines that have been created for development or testing purposes and that are actually not managed in professional way. They could for instance not have the latest security patches or not being configured according to the company security standards. Strong control processes may look to be the correct solution but strong control significantly diminishes the benefit of increased flexibility we get through virtualization. A better alternative is likely to use a weaker control process and then periodically launch a server farm inventory to spot potential security holes.
Costs
Project costs all too often exceed expectations and there are thousands of pages written on how to control the project so that costs do not exceed the budget. Virtualization projects have a specific issues related to SW licensing. Depending on the SW licensing rules the virtualization project can produce significant savings or cost increases. If the application is licensed according to the number of physical cores even when it runs on top of a Virtual Machine Monitor the cost will likely increase, seeing that virtual servers have typically many more processor cores than those required by any of the hosted SW applications. If on the contrary the application license takes into account the number of logical cores or the system utilization you may realize significant savings.
Conclusions
There are many risks in server virtualization projects that could offset or even exceed the project benefits. Accurate planning and analysis are required to mitigate the performance, availability and security risks; as well as to ensure that the expected financial benefits are accrued.