Monday 5 August 2013

VMware HA Best Practices

Use the VMware HA best practices in this section that are applicable to your ESX Server
Implementation and networking architecture.

Networking Best Practices
The configuration of ESX Server host networking
and name resolution, as well as the networking
infrastructure external to ESX Server hosts (switches, routers, and firewalls), is critical to optimizing
VMware HA setup. The following suggestions are best practices for configuring these components for improved HA performance:

Ensure that the following firewall ports are open for communication by the service console for all ESX Server 3 hosts:
Incoming Port: TCP/UDP 8042-8045
Outgoing Port: TCP/UDP 2050-2250
For better heartbeat reliability, configure end-to-end dual network paths between servers for
Service console networking. You should also configure shorter network paths between the
Servers in a cluster. Routes with too many hops can cause networking packet delays for
Heartbeats. If redundant service consoles are on separate subnets, specify “isolation address” for each Service console that is on its subnet. By default, gateway address for the network is used as isolation address.
Disable VMware HA (using VirtualCenter, deselect the
Enable VMware HA
check box in the cluster’s Settings dialog box) when performing any networking maintenance that might disable all heartbeat paths between hosts.
Use DNS for name resolution rather than the error prone method of manually editing the local
/etc/hosts
file on ESX Server hosts. If you do edit
/etc/hosts
 you must include both long and short names.
Use consistent port names on VLANs for public networks on all ESX servers in the cluster. Port
Names are used to reconfigure access to the network by virtual machines. If the names are
used on the original server and the failover server are inconsistent, virtual machines are
Disconnected from their networks after failover


Setting Up Networking Redundancy

Networking redundancy between cluster nodes is important for VMware HA reliability. Redundant
Service console networking on ESX Server allows the reliable detection of failures and prevents isolation conditions from occurring, because heartbeats can be sent over multiple networks.
You can implement network redundancy at the NIC level or at the service console or VMKernel
port level. In most implementations, NIC teaming provides sufficient redundancy, but you can use or add service console or port redundancy if you need additional redundancy.

NIC Teaming
Two NICs connected to separate physical switches can
Improve the reliability of a service console (or, in ESX Server 3i, VMkernel) network. Because
Servers connected to each other through two NICs (and through separate switches) have two
Independent paths for sending and receiving heartbeats, the cluster is more resilient.
To configure a NIC team for the service console, configure the vNICs in vSwitch configuration for the ESX Server host, for Active/standby configuration. The recommended parameters for the
vNICs are

.Rolling Failover = Yes
Default Load Balancing = route based on originating port ID


The following example illustrates the use of a single service console network with NIC teaming for network redundancy:

You assume some risk if you configure hosts in the cluster with only one service console
Network (subnet 10.20.XX.XX), so this example uses two teamed NICs to protect against NIC
Failure.
The default timeout is increased to 60 seconds (
das.failuredetectiontime = 60000
.
Secondary Service Console Network
As an alternative to NIC teaming for providing redundancy for heartbeats, you can create a
secondary service console network (or VMkernel port for ESX Server 3i), then attach that port to a separate virtual switch. The primary service console network is still used for network and management purposes. When you create the secondary service console network, VMwareHAsends heartbeats over both the primary and secondary service console networks. If one path fails,
VMware HA can still send and receive heartbeats over the other path.
By default, the gateway IP address specified in each ESX Server host’s service console network configuration is used as the isolation address. Each service console network should have one
Isolation address it can reach. When you set up service console network redundancy, you must specify an additional isolation response address (
das.isolationaddress2) for the
Secondary service console network. When you specify this secondary isolation address, VMware also recommends that you increase the
Das.failuredetectiontimesetting to 20000 milliseconds or greater.
Also, make sure you configure isolation addresses properly for the redundant service console Network that you create. Follow the networking best practices when designating isolation
Addresses. A further optimization you can make (if you have already configured a VMotion network) is to add a secondary service console network to the Motion vSwitchswitch can be shared between VMotion networks and a secondary service console network. each host in the cluster is configured with two service consoles. Each of these service console networks is connected to a separate physical NIC. The two networks are also on different subnets.
Use the default gateway for the first network and specify
das.isolationaddress2 = 192.168.1.103as the additional isolation address for the second network. Increase the default timeout to 20 seconds (
das.failuredetectiontime = 20000)

Other HA Cluster Considerations

Other considerations for optimizing the performance of your HA cluster include:
•Use larger groups of homogenous servers to allow higher levels of utilization across an HA-enabled cluster (on average). More nodes per cluster can tolerate multiple host failures while still guaranteeing failover capacities.
Admission control heuristics are conservatively weighted so that large servers with many virtual
Machines can fail over to smaller servers.
•To define the sizing estimates used for admission control, set reasonable reservations for the minimum resources needed.
Admission control exceeds failover capacities when reservations are not set; otherwise, VMware HA uses the largest reservation specified for a virtual machine in the cluster when deciding failover capacity.
At a minimum, set reservations for a few virtual machines considered average.
Admission control may be too conservative when host and virtual machine sizes vary widely. You may choose to do your own capacity planning by choosing
Allow virtual machines to be powered on

No comments:

Post a Comment