In larger organisations, typically, the network department and the
VMware group are seperated in different teams. So as a VMware
administrator you need to ask the network department to trunk VLANs to
the physical switch ports that your ESX is connected to. It happens that
the network department misses a port or a VLAN which means that you can
end up with a VM loosing network connection after e.g. a VMotion.
Unfortunately, the responsibility can land on the VMware administrator
for putting a host into production without testing VLAN connectivity.
Unfair, but that's life.
But testing VLANs the
manual way is rather time consuming. Especially if you have multiple
hosts with multiple nics and multiple VLANs. The number of test cases
quickly amount to the impossible. If, for example, you have five hosts,
five VLANs and 4 NICs in each host, that means (5 x 5 x 4) 100 test
cases.
The traditional way of testing is to
create a vSwitch with only one vmnic connected. Then connect a VM on
that vSwitch with one of the VLANs. Configure an IP address in the
address space of the VLAN and ping the gateway. Do this for all the
VLANs, and then connect the next vmnic to the vSwitch and start over.
The
following method speeds up VLAN testing significantly (in this case
from 100 to 16 test cases). It is not totally automated, but I have
found it very useful nonetheless.
The basics of
it is that you configure a port group to listen on all available VLANs
and then you enable VLAN tagging inside the VM and do your testing from
there:
1. Create a port group on the vSwitch
with ID 4095. This will allow the VM to connect to all available VLANs
available to the host.
2. Enable VLAN tagging
from inside the VM. This only works with the E1000 intel driver which
only ships with 64 bit Windows. So if you have a 32 bit Windows server,
then you need to first modify the .vmx file and then download and
install the intel E1000 driver from within Windows (Update: Even for Win
64 bit, you need to download and install E1000 manually. The advanced
VLAN option is not included in the default driver). This link describes how this is done. Note that when modifying the .vmx, add the following line:
Ethernet0.virtualDev = "e1000"
Note
that if you use the default Flexible nic to begin with, there's no
existing entry for the nic in the .vmx, so just add the new entry.
Under Edit Settings for the VM, attach the NIC to the VLAN with id 4095.
3.
Now you can add VLANs in the VM. Go to the Device Manager and then
Properties for the E1000 NIC. There's a tab that says VLANs (see
screendump below). As you add VLANs, a seperate NIC or "Local Area
Connection" is created for each VLAN. It is set for DHCP, so if there's a
DHCP server on that network it will receive an IP automatically. If
not, you will need to configure an IP for that interface manually (e.g.
by requesting a temporary IP from the network department.). For quickly
configuring the IP, you can run the following command from CMD or a
batch (.cmd) script:
netsh int ip set address "local area connection 1" static 192.168.1.100 255.255.255.0 192.168.1.254 1
4.
Now we will use the Tracert (traceroute) command to test connectivity.
The reason that we can't use Ping is the following: If you have multiple
VLANs configured and you ping a gateway on a given VLAN - and the VLANs
happen to be routable - then you will recieve a response from one of
the other VLANs even though the one your are testing is not necessarily
working.
But when using Tracert, then you can
be sure that if the gateway is reached in the first jump, then the VLAN
works. If the VLAN doesn't work, then you will see Tracert doing
multiple jumps (via one of the other VLANs) before reaching the gateway
(or it will fail if there's no connectivity at all). You can create a
simple .cmd file with a list of gateways that you execute from the CMD
prompt. Example file:
tracert 192.168.1.254
tracert 10.10.1.254
tracert 10.10.2.254
See below for example screendump.
Before
running the batch script you need to have only one physical nic
connected to the vSwitch. You can do this in one of two ways. 1) create a
seperate vSwitch and connect only one vmnic at a time. Then you control
it from VC. Or 2) you unlink all vmnics but one from the service
console (COS) with the following commands:
ssh to the ESX host
esxcfg-vswitch -l (to see current configuration)
esxcfg-vswitch -U vmnic1 vSwitch0 (this unlinks vmnic1 from vSwitch0)
esxcfg-vswitch -L vmnic0 vSwitch0 (this links vmnic0 to vSwitch0)
These
commands work instantaneously so you don't have to restart the network
or anything. Then you run through the test on one vmnic at a time. When
done with a host, you VMotion the VM to the next host in the cluster and
continue the test from there.