Reasonable Salary

The process of looking for a new job is stressful. If you already have one, you’re a bit like a secret agent, sneaking around town trying to complete the mission of getting someone new to agree to sign your paychecks, without the old boss finding out. If you don’t have a job, it’s even more stressful, as you wait around and watch your bank accounts dwindle, with nothing to replenish it.

I knew by March of this year that I was ready to move on from my now previous employer. I’ve never really had a difficult time finding a job when I decided to commit to the process. I don’t think this time was any different in that respect, but it was interesting. I was fortunate and excited to accept the position that I had the most interest in of all those I looked at during the entire process.

My process was around the same time that my friend @davemhenry was in the midst of his #HireDaveNow campaign on Twitter. It was fun to watch Dave advertise himself, while I was lurking in the shadows, although I’m sure it was super stressful for him at the time. It would have been refreshing to be able to shout “I’m available” to the world.

Someone eventually hired Dave.

Continue reading Reasonable Salary

Beginnings End

This morning I gave two weeks notice to my current employer, a Kansas City based VAR, where I have been a senior data center engineer for the last six years.

I’ve enjoyed many aspects of my current role; becoming certified in new technologies, learning new skills, and solving problems for customers. I’ve had the pleasure to work with a lot of talented people within the organization and within our partners… and of course, with our customers.

Continue reading Beginnings End

Choosing Electives

I’m going to start this by saying something that might seem strange for a post like this, but is no surprise to my closest friends: The last two months, and especially the last two weeks, have been very stressful and mentally draining. Without getting into the details of it all, I will simply say that the biggest contributing factor, or at least the medium that has facilitated the stress, has been social media.

I decided to temporarily set my Twitter account private for a few days last week, something I’d never done in nine years on the service. The only thing I learned from that, is that having a private Twitter account sucks. Over the last few months, I’ve unfollowed and set mute filters for topics that generated more noise than signal. I’ve tried to step back and get some perspective on what’s going on in the world right now.

Continue reading Choosing Electives

Neowin Retrospective

Most of the people who know and interact with me professionally, or on social media know me as “vmstan” — and if you asked most of those people they’d tell you I only pay attention to two things when it comes to technology: VMware and Apple.

They’d be mostly right.

But there was a time before that, where I was “Marshalus” — and if you asked most of the people who knew what he paid attention to it was one thing: Microsoft. Specifically, covering Microsoft at Neowin.

That’d have been mostly right, too.

Continue reading Neowin Retrospective

Enough Windows

I’ve not been a true “Windows user” on a daily basis since the glorious afternoon my first MacBook Pro arrived in 2011. That didn’t exactly mean I quit using Windows on that day, but over time I’ve continued to slim down my actual needs of the Windows desktop operating system to the point where now I keep a Windows VM around for “just enough” of the things I need from it.

Windows 10 is a huge advancement over Windows 7, which is where I left off as a PC user and over these last six years Microsoft has learned a lot from Windows 8.x being such a mess. But Windows 10 is an OS intended for use on everything from 4” smartphones to watercooled gaming rigs with multiple 27” 4K displays.

In this guide I’ve focused on simple methods of stripping out a lot of the things that don’t apply to virtual machine usage, and some of the cruft that is really only useful for someone running it on a daily driver. Typically I can reduce the idle memory and disk footprint by about 25% without loss in necessary functionality.

These instructions are not all specific to VMware Fusion, but some are. This also isn’t designed to be the “ultimate guide” in Windows 10 performance, space savings, or anything else. It’s a quick and clean way to do most of those things but not all encompassing. I think it’s easy for some of those types of optimization guides to focus on getting Windows to the point where it’s so lacking it’s almost unusable or starts breaking core functions.

This is a “light” optimization for my usage. It could it yours as well, if you have similar needs like running a small collection of utility type applications, such as a couple of EMC product deployment tools, or the old VMware client.

Continue reading Enough Windows

Leather Wrapped

I have an on-again, off-again relationship with iPhone cases. I put them on. I take them off. I generally don’t like cases. I’ve only broken my iPhone one time and that was when my 6 Plus came out of my pocket attached to my hand, unintentionally, on a sticky day. My iPhone 5 and 6 were rarely in cases, and had minimal wear and tear. I’m usually pretty careful. I also buy AppleCare+ on them, even though I’m lucky enough to rarely need it.

Continue reading Leather Wrapped

Pant Lover

I have some strict requirements around work pants. My wife hates the “I can see your socks while you’re standing up” hipster look, so they have to be full length. Honestly it’d be a great look since I’m 6’4” but as a result I’m at a 36” inseam. I’m also currently 220lbs, which results in a 36” waist. I could probably lose some weight, but it’s not happening today.

I also have a job that’s requires me to dress nicely to meet a customer in the morning, but be willing to crawl under raised floors and chuck 50# boxes around later that afternoon, without a change of clothes. Expensive slacks will get destroyed. Wearing jeans everyday is frowned upon. I also don’t want to deal with getting pants tailored.

Between size, cost, looks and durability, I’ve found one pair of pants that consistently meet all my requirements.

Continue reading Pant Lover

Securely Obscure

A couple of years ago, one of our network security architects at work told me that I was in the wrong business. Storage, virtualization, data centers, it’s all going to the cloud. I’d soon be out of a job. 

I barely knew the guy. At first I politely laughed when he said it, but then realized he was serious. Not really a great way to make new friends at work. The irony of the situation was that he tracked me down on one of the few times I was in the office, and approached me to help him lay out some of the VMware requirements for a Trend Micro Deep Security implementation. 

It wasn’t more than a few months later, that he didn’t work for my employer anymore … not by his choice … and I’m still there, two years later, still billable most of the week. 

I don’t even remember his name. 

But, he’s wasn’t wrong, just a jerk. It’s not as if he was delivering some sort of life changing message, that I’d never heard before. It’s one I hear repeated very often on social media, in conference presentations, etc, and in the wake of this Amazon re:Invent conference last week, I’m hearing it a lot. 

It’s undeniable that a big part of my job is chucking boxes of rust and silicon into racks, stringing copper and fiber optics around, and making it all sing together in unison. I kind of enjoy it.

It’s also undeniable that things are changing.

VCSA Migration

Last night I did my first customer migration from a Windows based vCenter to the VMware vCenter Server Appliance (VCSA) using the new 6.0 U2M utility.

The customer was previously running vCenter 5.1 GA on a Windows Server 2008 R2 based physical HP host. In order to migrate to the VCSA, we first had to do two in place upgrades of vCenter from 5.1 GA to 5.1 U3, then again from 5.1 U3 to 5.5 U3d. After that, onto the VCSA migration.

Given the length of time the system was running on 5.1 GA code (ouch) and the amount of step upgrades required just to get things cleaned up, there was some cause for nervousness.

I admit, even though I’d read up on it, tested it in a lab, and heard other success stories … I still expected my first try to be kind of a mess.

But, it was not. The entire migration process took around 30 minutes, and was nearly flawless.

I had more issues with the upgrade from 5.1 to 5.5 than anything else during this process. Somewhere during that 5.5 upgrade the main vCenter component quit communicating with the SSO and inventory service. There were no errors presented during the upgrade, but it resulted in not being able to login at all through the C# client, and numerous errors after eventually logging in as [email protected] to the Web Client.

I tried to run through the KB2093876 workarounds, but was not successful. I ended up needing to uninstall the vCenter Server component, remove the Microsoft ADAM feature from the server, and then reinstall vCenter connected to the previous SQL database. Success.

Given those issues, I was nervous about the migration running into further issues, mostly from the old vCenter.

But again, it worked as advertised.

After the migration I did notice the customer’s domain authentication wasn’t working using the integrated Active Directory computer account. After adjuting the identity provider to use LDAP, it worked fine. I’ve had this happen randomly enough on fresh VCSA installs to think its something to do with the customer environment, but I was under the wire to get things back up and felt there was no shame in LDAP.

I’ve done too many new deployments of the VCSA since 5.x to count, and at this point was already pretty well convinced there was no reason for most of my customers to deploy new Windows based vCenters. I’d also done a fair bit of forklift upgrades with old vCenters where we ditch everything to deploy a new VCSA, which isn’t elegant, but it works if for my smaller customers that still don’t yet have anything like View, vRA, SRM, integrated backups/replication, etc.

Now I’m confident that any existing vCenter can be successfully migrated.

Windows vCenters, physical and virtual: I’m coming for you.

RAID Crash

Recently I had two VMware Horizon View proof of concept setups for work, where we designed an all in one Cisco UCS C240 M4 box, full of local SSD and spindles, in various RAID sets. This lets the customer kick the tires on View in a small setup to see if its a good fit for their environment, but on something more substantial than cribbing resources from the production environment.

  • 5x 300GB 10K SAS RAID 5 for Infrastructure VMs (vCenter, View Broker/Composer, etc)
  • 10x 300GB 10K SAS RAID 10 for VM View Linked Clones
  • 6x 240GB SSD RAID 5 for View Replicas
  • 1x hot spare for each drive type
  • VMware ESXi 6.0 U2 is installed on a FlexFlash SD pair

After getting all the basics configured, we had a single View connection broker, with another View Composer VM on a local SQL Express 2012 instance for the database. Both were version 7.0.2. At the first site the VM base image we attempted to deploy was an optimized Windows 7 x64 instance.

But under any sort of load during a deployment of more than a handful of desktops, the entire box would come to a total stop. In some cases the only way to restore any functionality was to pull the power and restart the infrastructure VMs, one by one. Of course, once the broker and composer instances were connected, they’d attempt to create more desktops and the cycle would continue. In an attempt to isolate the issue, we tried various versions of the VMware Tools, a new Windows 7 x86 image, and I even duplicated the behavior by building a nearly identical View 6.2.3 environment, within the same box.

After digging through the esxtop data as clones were being created, I could see KAVG/Latency across all RAID sets jumps to as high as 6000ms right before all disk activity on the system eventually stops.

It didn’t matter what configuration I tried, it was present with a fresh install of ESXi 6.0 U2, and after applying the latest host patches. It was present on the out of box UCS firmware of 2.0(10), and with the stock RAID drivers from the Cisco ISO. It was present after updating the firmware, and the drivers. It also happened regardless of if the RAID controller write back cache was enabled/disabled for the various groups.

Cisco is very particular about making ESXi drivers for their components match their UCS compatibility matrix, so before I decided to give TAC a call, I made sure (again) that everything matched exactly. TAC ended up reviewing the same logs, to determine if this was a hardware issue, and while they made a couple of suggestions for adjustments, they were not successful in diagnosing a root cause. Yet, they insisted based on what they were seeing that it was not a hardware issue.

With this particular customer, we were also impacted by a variety of issues relating to the health of the DNS and Active Directory environment. With that in mind, we decided to focus on fixing the other environmental issues and in the meantime, not overload the UCS box until a deeper analysis could be done.

Try Try Again

A day or so into the second setup at another customer, and I encountered the exact same issues. This time with a Windows 10 x64 image, and View 7.0.2. The same crazy latency numbers under any amount of significant load, until the entire box stopped responding.

The physical configuration differed slightly in that we were integrating the C-Series UCS into the customers fabric interconnects, so the firmware and driver versions were even more different than the first host which was a standalone configuration connected to the customer’s network. After digging into it again with a fresh brain, and more perspective, I found the cause.

I started looking through the RAID controller driver details again. In both cases, VMware uses the LSI_MR3 driver as the default driver for the Cisco 12G RAID (Avago) controller in ESXi 6.0 U2. In both environments I verified that we were running the suggested driver versions based on the Cisco UCS compatibility matrix, and we were. So I started digging at this controller and wondered what VMware suggests for VSAN (keeping in mind we aren’t running VSAN at either site) and sure enough, they DO NOT suggest using the LSI_MR3 driver, but instead list the “legacy” MEGARAID_SAS driver as their recommendation, for the exact same controller.

After applying the alternative driver, I’ve not been able to break the systems.

What is odd, is that this appears to be related specifically to Cisco’s version of the controllers.

This week I did a similar host setup (although not for View) using a bunch of local SSD/SAS drives in a Dell PowerEdge 730xd, with their 12G PERC H730 RAID cards (which from what I can see appear to be rebranded versions of the same controller) and VMware’s compatibility matrix has the LSI_MR3 drivers listed.

I left those drivers enabled, and the customer ran a series of agressive PostgreSQL benchmarks against the SSD sets, with impressive results, and no issues from the host.

So, long story short, if you’re using local RAID sets for anything other than some basic boot volumes that don’t need any serious I/O, with the Cisco 12G RAID controller, you don’t want to use the Cisco recommended drivers.

Installation instructions

  • Download the new driver (for ESXi 6.0 U2)
  • Extract the .vib file from the driver bundle and copy it to a datastore on the host
  • Enable SSH on the host and connect to it via your terminal application of choice
  • Apply the driver from the SSH session and disable the old one.
  • Reboot the host
  • Reconnect via SSH, and run core adapter list command to verify it’s active

This should verify that your RAID controller (typically either vmhba0 or vmhba1 is now using the megaraid_sas driver. If the “UID” is listed as “Unknown” in this readout, it’s normal.