Monday, January 14, 2019

Homelab refresh

Finally replacing my homelab, for two reasons, consisting of three hosts from 2010 it was ancient, and additionally I lost a drive and my vSAN blew up.

vSphere has finally pulled out the x86 instruction emulation code that allowed really old CPUs to work so while 6.7U1 ran on my 5630L CPUs I couldn't do a clean install (would have had to install 6.5 and upgrade) and nested virtualization was becoming limited by the same thing, which is kind of my killer app for a homelab, on the hosts themselves upgrading is OK, but not being able to instantly have a >6.5 nested host was a pain.

I didn't understand vSAN :)  I'd been running it a long time on unsupported everything (controllers, drives, NICs, you name it) and my early mistakes couldn't be easily fixed as if I tried to reconfigure anything on the fly I got error messages rather than actions.  With money I could have fixed it - by replacing the controllers and buying enough disks for an additional disk group and migrating, or doing something ugly like moving data onto a USB drive or 2 bay NAS...didn't come to that anyhow as I lost so much data there was little point in saving any.

The critical thing I hadn't understood was that erasure coding needs a minimum of four hosts, more if you want to do maintenance, so turning it on in my three host cluster was not smart.  One of my SSDs failed and about half my VMs went with it as they must have had blocks on that disk group that couldn't be pulled from elsewhere.  I daresay I could have recovered many of them but in the lab nothing was critical enough to bother, my greatest pang is for my trusty Windows 7 admin VM...I have been way too cowboy in my lab, which was fine a decade ago, when it was a fraction the size, local to me, and using NFS storage.  These days when I blow it up with a pre-release build that I then find can't be upgraded, or by turning on features for fun before I understand the consequences it's a huge effort to recover.  Nested labs make a lot of sense, where I used to almost enjoy (OK enjoy is overstating it, but I did derive some masochistic pleasure from it and revel in being an expert), the installation pains of the VMware suite, many of those have been reduced (finally) to the point there's no learning in that stage of things.

As with the old cluster I got somewhat carried away building a new one, I similarly received some cast off gear for free and supplemented from eBay and stripping my old systems (only reusing the SSDs). I wanted to grow to four nodes without taking up more space, so when I was gifted a 2015 vintage Supermicro Twin I was happy to purchase a second in order to end up with four identical hosts in 4U of space (replacing 3 X 2U boxes).  This particular model has a SAS controller onboard so I can live with the 2 PCIe slots, reusing my Intel Optane 900p NVMe drives* in one for vSAN cache layer, and installing new Intel X710 10 gig NICs in the other.  (If I'd had a third slot I would've reused my X520's in order to have NICs to pass through to VMs when playing with NSX-T etc.)
The build took a long time as I wanted the firmware on the BIOS/IPMI/SAS controller (now in HBA mode) NICs to all be current - all of which takes a lot of power cycles and messing about, I do see why people purchase vSAN ReadyNodes.  These boxes, 2028TP-DC0R, support current Xeons, I'm using E5-2630L v3, which are not very recent, but cost and power effective and importantly Haswell series so good for some time to come.

The X710's were the biggest time suck, I had two fail on me, not sure if I was unlucky with static, or upgrading their firmware bricked them after a power loss or something.  I would've put the X520s in and been done with them but I only had three and I really wanted the four nodes identical.
I also had second thoughts on RAM, having built out with 128GB per node, deciding longevity would be served better with 192 per.  VMware's stack loves RAM and once I have a pretty complete SDDC running plus a few third party integrations I'd be swapping.

I also turned back on Transparent Page Sharing, enabled nested virtualization, and though I don't thing any of my operating systems support it right now TRIM in vSAN.
I'm now a happy camper, building out a nested lab, the below shows resources consumed by my management layer:



* The Optane are awesomely fast, and have ridiculous endurance too for consumer drives, the 280GB have 336GB inside which supposedly isn't used for traditional over-provisioning, but they must use some of it to help deliver that longevity.  I figure that having the cache tier off the main controller saves the queue in that for destaging to my relatively slow consumer grade SSDs too.  (I had some Enterprise SSDs at one point but they also gave me my only SSD failures, out off warranty of course, where the Samsung Pro consumer drives have been issue free)



Bill of materials:

2 X Supermicro 2028TP-DC0R Twin systems (four nodes) (3008 SAS controller onboard)
8 X Intel Xeon E5-2650Lv3 1.8Ghz 12 core
4 X Intel Optane 900P 280GB PCIe (cache drives, not on vSAN HCL)
4 X Supermicro 64GB SATA-DOM
4 X Intel X710DA Dual 10g SFP+ dual port
4 X Intel 1.6TB S3610 SAS SSD
4 X Samsung 850Pro SATA drives (not on vSAN HCL)
64 X 16GB ECC DDR4 DIMM
2 X HPE 5900 switches (48 gigabit, 4 SFP+, 2XQSFP)

Total, about 20K, but over time and much eBay so very approximate.

P.S. Were I doing this again I'd get the 2028TP-DC0TR, which is exactly the same but with Intel X540 ten gig NICs on board, cost difference now is negligible.  

No comments:

Post a Comment