Tuesday, June 23, 2015

VCDX

My VCDX Journey - Fourth time's a charm!

I'm proud to be able to say I'm now VCDX #197. It seems somewhat of a tradition to write one of these, so despite the count of VCDX's reaching two hundred I'll have a crack. I started writing a blog entry back in 2012 when I originally embarked on this, but never finished it or hit publish.  Through the power and elephantine memory of Google here it is:

What led me here?
I recently received my VCAP-DCA result, appropriately enough, right after I presented a session at VMworld, and unexpectedly passed - I hadn't even answered all the questions before running out of time and knew I'd not scored on a couple.  Little did I know that was normal.  I'd already passed DCD, having taken the beta November 2010, so felt with both, plus a lot of time invested in pre and post sales vSphere consulting, that going for it and trying to make the final VCDX 4 defense in Frankfurt is worthwhile.  These days I'm no longer a consultant, instead part of the VMware alliance team at F5 Networks, which is great, but means it's going to take a long time to develop the depth of both architectural and real world understanding that I think are required of the VCDX - so waiting for version 5 doesn't appeal.  VMware are also going great guns enhancing the products with every release, so there's more to cover with each too, it just gets harder.
Reminds me a lot of the CCIE program back in the day - I took it twice, in 2001 and 2002, right when it contained the kitchen sink of networking:  DLSW, IPX, Token-Ring, Appletalk, VoIP, ATM, in addition to all the IP protocols.  VCDX is very different and to some degree you can self-select your specialties in choosing what to include in your submitted design, but still has that tendancy to get broader and broader, whilst remaining just as deep.  Of course you'll get questioned in the defense on whatever you didn't include too.

Back to present:
I didn't get to Frankfurt, I was too slow at getting it all together, but did submit and defend in Burlington May 2012. I now know that I was very close to passing but didn't, so I defended again in Barcelona in October 2012 and again missed, I suspect by further than the first time.
Roll forward a couple of years, my wife and I had a daughter the following year so VCDX was pushed off the stack for a while. Joining the GCoE pre-sales team at VMware in February of '14 things changed, the team was half VCDX's already and our manager made it clear he'd like the whole team to attain it.
I'd been working on a design already as my old one was vSphere 4 so no longer eligible, and I wanted to incorporate NSX. I was probably two-thirds done with the design and had only outlined the operations and implementation guide. The suggestion was made to do a group submission, with us each playing to our strengths in the sections we write and lots of review, and the pure division of labor, targeting the PEX 2015 defense round.
Working as a group wasn't perfect - we were all in different timezones for a start, and the group decision to go for the Cloud track not ideal for me, but we got it done. We were accepted and February rolled around, I did a horrible job in the defense. I was not expert on the material and nerves ate me up, I barely touched the white board and generally did a poor job of demonstrating the skills of a VCDX. Two of us passed though so I couldn't blame the panel nor the submission.
I was determined to go again as soon as I received the result, and used the experience of the defense to improve the design - there were decisions I couldn't justify because I didn't think they were good and plenty of contradictions and typos that had escaped the ten or so reviews. I'd also labbed lots of vCD in the interim, for me no amount of reading can substitute for hands-on time and researching all the tough questions from the first defense had led me to lots of background I had been missing; expert means expert and I had some holes. The defense was much easier for it, though still nervous I could remember enough to not feel like an idiot, and white board a bunch of stuff too. Still finished feeling I'd failed again but after four times I think that's normal.
My advice after all of this? Go for it, it's worth doing for the knowledge you develop along the way if nothing else. Though I'm a bit of a certification collector (I have worked for channel partners so was paid to) I find vendor certs to be a useful training/development path. I didn't anticipate having to persevere quite so long on this, but the two year break and track change contributed, CCIE took me two attempts too, if something's worth doing it's not going to be easy.
As to the VCDX itself, you need to be an expert on the material, both on your own design and the process of getting to a logical design for a fictional customer. Every decision must be justified by the requirements and constraints! If you're anything like me you need to have enough knowledge of it all that when your performance is impaired by nerves you can still demonstrate enough of it to clear the bar.
Up next? Well as a networking guy I'd rather like to get the VCDX-NV, and every two years after renewing my CCIE (with the Data Center exam last year), I toy with idea of taking another lab...
With respect to Walmart my version of their motto would be 'Always be learning. Always'

Tuesday, May 12, 2015

Useful NSX CLI commands

Useful NSX CLI commands

I'm not going to repeat run of the mill install stuff, but just commands that I've found / people have pointed me to when I've hit issues.  I'll add some API stuff in another post at some point as there's a bunch of stuff not in the CLI at all yet too.

When controllers don't deploy (or deploy then get immediately deleted):
Check disk space on specified datastore
Check /var/log/netcpa.log on the ESX hosts for IP pool allocation issues on controllers
Frequently issues arise because of connectivity NSX Manager to Controllers, and the most common of all:  DNS and NTP issues.

In the Manager CLI there's a handy 'show running-config' these days, which doesn't show a whole lot but will show if you fat fingered it's own network settings.
'show manager log follow' tails the main log file, which aids with all kinds of deployment debugging as the errors can be more verbose than in the GUI.


To troubleshoot MTU issues:
‘esxcli network interface list’ ‘esxcli network nic list’ and ‘ping ++netstack=vxlan x.x.x.x -d -s 1600’  where x.x.x.x is the IP of another hosts VTEP.

vCNS commands that may work:
esxcli network vswitch dvs vmware vslan network mapping list --vds-name=myvds --vxlan-id=5001

esxcli network vswitch dvs vmware vxlan list
esxcli network vswitch dvs vmware vxlan config stats set --level 1
esxcli network vswitch dvs vmware vxlan stats list --vds-name=myvmware

esxcli network vswitch dvs vmware vxlan vmknic multicastgroup list --vds-name=myvds --vlan-id=100

esxcli network vswitch dvs vmware vxlan network stats list --vds-name=myvds --vxlan-id=5001

The dvfilter is the bit that sits between the vmnic and the vswitch and does the packet filtering (and presumably steering in the case of the PANW integration)

summarize-dvfilter - gives back a list of filters present on the host
pktcap-uw --dvfilter $filter-name can then be used to sniff traffic, with --PreDVFilter or --PostDVFilter to help figure out if a rule is not doing what is expected.

pktcap-uw -A    Broader packet capture on ESXi

vsipioctl getfwrules -f $filter_name

ESXi - Controller is TCP/1234

Equivalent of a 'sh cam dy' or sh mac-addr'
net-vdr -b –mac default+edge-1

On Edge, debug packet display interface Nic_0 host_192.168.1.1

show log follow

show service ipsec site

Rene's huge page of links:
http://vcdx133.com/2014/10/05/nsx-link-o-rama/

Logging:
When trying to introduce micro-segmentation to a running environment especially, and for debugging for ever after, not to mention for security auditing, logging is somewhat vital.
NSX has lots of logs all in different places, so redirecting them all to Log Insight / some central location is the way to go, to configure add the log host on NSX manager, and the Edges.
The distributed firewall is distributed :)  So add on every ESXi host:
esxcli system syslog config set --loghost=‘udp://192.168.110.241:514' on every ESXi host
esxcli system syslog reload
esxcli network firewall ruleset set --ruleset-id=syslog --enabled=true
esxcli network firewall refresh

Tuesday, January 20, 2015

Useful Linux / Virtual Appliance commands


If / when you need to recover the root password to a linux box or VMware virtual appliance:
Edit the kernel boot line and add init = /bin/bash in order to get a shell to reset the root password.

To disable IPv6 add these lines to the bottom of sysctl.conf:
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
Then run sudo sysctl -p or reboot

Plus add startup options for BIND if running, otherwise it will keep trying to use IPv6 anyhow:
OPTIONS="-4 -u bind"
To /etc/default/bind9

Always install NTP.

If running in a VM then install open-vm-tools, so much easier than installing VMware tools plus build essential, building the tools then having them break during a kernel upgrade sometime later.

LACP is provided by ifenslave:
In /etc/network/interfaces:

auto em1   # on my boxes Ubuntu changes eth0/1 to em1/2 on boot
iface em1 inet manual
bond-master bond0
auto em2
iface em2 inet manual
bond-master bond0

auto bond0
iface bond0 inet dhcp  # or more likely static
bond-mode 4     # mode 4 is 802.3ad / LACP
bond-miimon 100
bond-lacp-rate 1
bond-slaves em1 em2

To see if you're speaking LACP with the switch:
cat /proc/net/bonding/bond0

LLDP was as easy as 'apt-get install lldpd' on my hosts, output on the newer HP switches is a little funky, but my older Procurve gives the hostnames which is perfect.

'lldpcli show neighbors' shows what switch ports you're plugged into from the host side.

For cu -l /dev/ttyUSB0 -s 9600 to work ensure uucp user and group have r/w permissions on the device.  Even as root cu drops to uucp, and permission denied gives the not useful error of 'device in use'

I had no idea how many of the commands I regularly use are deprecated,
Deprecated Linux networking commands and their replacements
there's no way I'm going to stop typing the old ones yet, just as I still use wr mem on any Cisco device that will still accept it, but good to file the list of replacements me thinks.

Unattended updates are great - at least on systems where restarting services won't cause an outage (had a lot of mysterious SQL issues until I released the server was being restarted by this).  However while it cleans up downloaded files, it does not autoremove installed packages, with the issue that /boot fills up with never used kernel images.  To prevent this happening I added
0 0 * * 0 apt-get autoremove -y
to the root crontab.  If it already happened to you too, I have fixed many times with a combination of the following:


apt-get remove --purge 2.6.2x-xx-*
If /boot is full and apt-get remove or autoremove won't work, then 
rm -rf /boot/*-3.19.0-{25,56,58,59,61,65}-*
should create enough space to get apt working again.