NIC not coming up after rebooting ESXi server

plguinCableI ran across an odd issue this morning whilst doing some maintenance. I pushed out a new vendor patch to one of our ESXi 5.5U2 hosts, but after it came back up it would not reconnect to vSphere.

Luckily I have a DRAC on this machine, so I was able to pull up the console. First thing I tried was restarting the management network¬†… no dice. Hmmm, OK, let’s restart the management agents, although that is odd since it literally just booted. When that didn’t work, I gave the machine another reboot as I was running out of options. Hmmm, still nothing … this isn’t looking good.

This almost sounds like some sort of link layer problem … if I was in the office I would just unplug the NIC and plug it back in … however I’m not in the office. Let’s go to the switch, find the MAC address (which I can pull from the DRAC) and toggle the switch port. I pulled up the switch’s interface and checked the MAC address table, but lo and behold the MACs that I was looking for weren’t in there …. ugh, this is really starting to suck.

If I can’t disable the NIC at the port end, I wondered if I could disable the NIC via the DRAC, but that didn’t look like an option. Back at the ESXi console I tried removing the NICs from the vmnic config, but that didn’t help either.

Hmmm – short of driving into the office on a weekend, what’s my next step? Let’s power off the machine and fire it back up – once again, I was very appreciative of the DRAC.

After that the NICs seemed to respond again and showed up in the MAC address table on the switch. Really odd. This was on a Dell R730 with Broadcom NICs – I also rebooted a R720 and a R710 in the process without issue – one of them had a Broadcom NIC and the other had a Broadcom and Intel NIC.

Morals of the story:

  • Out of band management is awesome
  • vMotion saved the day – if all the VMs had been on the machine it would have been pretty painful
  • Just because something should work, doesn’t mean it will – get past that and find creative ways to solve the problem
  • Sometimes things just suck

