Enough Windows

I’ve not been a true “Windows user” on a daily basis since the glorious afternoon my first MacBook Pro arrived in 2011. That didn’t exactly mean I quit using Windows on that day, but over time I’ve continued to slim down my actual needs of the Windows desktop operating system to the point where now I keep a Windows VM around for “just enough” of the things I need from it.

Windows 10 is a huge advancement over Windows 7, which is where I left off as a PC user and over these last six years Microsoft has learned a lot from Windows 8.x being such a mess. But Windows 10 is an OS intended for use on everything from 4” smartphones to watercooled gaming rigs with multiple 27” 4K displays.

In this guide I’ve focused on simple methods of stripping out a lot of the things that don’t apply to virtual machine usage, and some of the cruft that is really only useful for someone running it on a daily driver. Typically I can reduce the idle memory and disk footprint by about 25% without loss in necessary functionality.

These instructions are not all specific to VMware Fusion, but some are. This also isn’t designed to be the “ultimate guide” in Windows 10 performance, space savings, or anything else. It’s a quick and clean way to do most of those things but not all encompassing. I think it’s easy for some of those types of optimization guides to focus on getting Windows to the point where it’s so lacking it’s almost unusable or starts breaking core functions.

This is a “light” optimization for my usage. It could it yours as well, if you have similar needs like running a small collection of utility type applications, such as a couple of EMC product deployment tools, or the old VMware client.

Continue reading Enough Windows

VCSA Migration

Last night I did my first customer migration from a Windows based vCenter to the VMware vCenter Server Appliance (VCSA) using the new 6.0 U2M utility.

The customer was previously running vCenter 5.1 GA on a Windows Server 2008 R2 based physical HP host. In order to migrate to the VCSA, we first had to do two in place upgrades of vCenter from 5.1 GA to 5.1 U3, then again from 5.1 U3 to 5.5 U3d. After that, onto the VCSA migration.

Given the length of time the system was running on 5.1 GA code (ouch) and the amount of step upgrades required just to get things cleaned up, there was some cause for nervousness.

I admit, even though I’d read up on it, tested it in a lab, and heard other success stories … I still expected my first try to be kind of a mess.

But, it was not. The entire migration process took around 30 minutes, and was nearly flawless.

I had more issues with the upgrade from 5.1 to 5.5 than anything else during this process. Somewhere during that 5.5 upgrade the main vCenter component quit communicating with the SSO and inventory service. There were no errors presented during the upgrade, but it resulted in not being able to login at all through the C# client, and numerous errors after eventually logging in as [email protected] to the Web Client.

I tried to run through the KB2093876 workarounds, but was not successful. I ended up needing to uninstall the vCenter Server component, remove the Microsoft ADAM feature from the server, and then reinstall vCenter connected to the previous SQL database. Success.

Given those issues, I was nervous about the migration running into further issues, mostly from the old vCenter.

But again, it worked as advertised.

After the migration I did notice the customer’s domain authentication wasn’t working using the integrated Active Directory computer account. After adjuting the identity provider to use LDAP, it worked fine. I’ve had this happen randomly enough on fresh VCSA installs to think its something to do with the customer environment, but I was under the wire to get things back up and felt there was no shame in LDAP.

I’ve done too many new deployments of the VCSA since 5.x to count, and at this point was already pretty well convinced there was no reason for most of my customers to deploy new Windows based vCenters. I’d also done a fair bit of forklift upgrades with old vCenters where we ditch everything to deploy a new VCSA, which isn’t elegant, but it works if for my smaller customers that still don’t yet have anything like View, vRA, SRM, integrated backups/replication, etc.

Now I’m confident that any existing vCenter can be successfully migrated.

Windows vCenters, physical and virtual: I’m coming for you.

RAID Crash

Recently I had two VMware Horizon View proof of concept setups for work, where we designed an all in one Cisco UCS C240 M4 box, full of local SSD and spindles, in various RAID sets. This lets the customer kick the tires on View in a small setup to see if its a good fit for their environment, but on something more substantial than cribbing resources from the production environment.

  • 5x 300GB 10K SAS RAID 5 for Infrastructure VMs (vCenter, View Broker/Composer, etc)
  • 10x 300GB 10K SAS RAID 10 for VM View Linked Clones
  • 6x 240GB SSD RAID 5 for View Replicas
  • 1x hot spare for each drive type
  • VMware ESXi 6.0 U2 is installed on a FlexFlash SD pair

After getting all the basics configured, we had a single View connection broker, with another View Composer VM on a local SQL Express 2012 instance for the database. Both were version 7.0.2. At the first site the VM base image we attempted to deploy was an optimized Windows 7 x64 instance.

But under any sort of load during a deployment of more than a handful of desktops, the entire box would come to a total stop. In some cases the only way to restore any functionality was to pull the power and restart the infrastructure VMs, one by one. Of course, once the broker and composer instances were connected, they’d attempt to create more desktops and the cycle would continue. In an attempt to isolate the issue, we tried various versions of the VMware Tools, a new Windows 7 x86 image, and I even duplicated the behavior by building a nearly identical View 6.2.3 environment, within the same box.

After digging through the esxtop data as clones were being created, I could see KAVG/Latency across all RAID sets jumps to as high as 6000ms right before all disk activity on the system eventually stops.

It didn’t matter what configuration I tried, it was present with a fresh install of ESXi 6.0 U2, and after applying the latest host patches. It was present on the out of box UCS firmware of 2.0(10), and with the stock RAID drivers from the Cisco ISO. It was present after updating the firmware, and the drivers. It also happened regardless of if the RAID controller write back cache was enabled/disabled for the various groups.

Cisco is very particular about making ESXi drivers for their components match their UCS compatibility matrix, so before I decided to give TAC a call, I made sure (again) that everything matched exactly. TAC ended up reviewing the same logs, to determine if this was a hardware issue, and while they made a couple of suggestions for adjustments, they were not successful in diagnosing a root cause. Yet, they insisted based on what they were seeing that it was not a hardware issue.

With this particular customer, we were also impacted by a variety of issues relating to the health of the DNS and Active Directory environment. With that in mind, we decided to focus on fixing the other environmental issues and in the meantime, not overload the UCS box until a deeper analysis could be done.

Try Try Again

A day or so into the second setup at another customer, and I encountered the exact same issues. This time with a Windows 10 x64 image, and View 7.0.2. The same crazy latency numbers under any amount of significant load, until the entire box stopped responding.

The physical configuration differed slightly in that we were integrating the C-Series UCS into the customers fabric interconnects, so the firmware and driver versions were even more different than the first host which was a standalone configuration connected to the customer’s network. After digging into it again with a fresh brain, and more perspective, I found the cause.

I started looking through the RAID controller driver details again. In both cases, VMware uses the LSI_MR3 driver as the default driver for the Cisco 12G RAID (Avago) controller in ESXi 6.0 U2. In both environments I verified that we were running the suggested driver versions based on the Cisco UCS compatibility matrix, and we were. So I started digging at this controller and wondered what VMware suggests for VSAN (keeping in mind we aren’t running VSAN at either site) and sure enough, they DO NOT suggest using the LSI_MR3 driver, but instead list the “legacy” MEGARAID_SAS driver as their recommendation, for the exact same controller.

After applying the alternative driver, I’ve not been able to break the systems.

What is odd, is that this appears to be related specifically to Cisco’s version of the controllers.

This week I did a similar host setup (although not for View) using a bunch of local SSD/SAS drives in a Dell PowerEdge 730xd, with their 12G PERC H730 RAID cards (which from what I can see appear to be rebranded versions of the same controller) and VMware’s compatibility matrix has the LSI_MR3 drivers listed.

I left those drivers enabled, and the customer ran a series of agressive PostgreSQL benchmarks against the SSD sets, with impressive results, and no issues from the host.

So, long story short, if you’re using local RAID sets for anything other than some basic boot volumes that don’t need any serious I/O, with the Cisco 12G RAID controller, you don’t want to use the Cisco recommended drivers.

Installation instructions

  • Download the new driver (for ESXi 6.0 U2)
  • Extract the .vib file from the driver bundle and copy it to a datastore on the host
  • Enable SSH on the host and connect to it via your terminal application of choice
  • Apply the driver from the SSH session and disable the old one.
  • Reboot the host
  • Reconnect via SSH, and run core adapter list command to verify it’s active

This should verify that your RAID controller (typically either vmhba0 or vmhba1 is now using the megaraid_sas driver. If the “UID” is listed as “Unknown” in this readout, it’s normal.

Factory Reset

I ran into a situation recently where the need arose to effectively “factory reset” an Generation 5 EMC RecoverPoint Appliance (Gen 5 RPA). In my case, I had one RPA where the local copy of the password database had become corrupted, but the other three systems in the environment were fine. There was nothing physically wrong with the box, I just wanted to revert it back to new and treat it like a replacement unit from EMC, and rejoin it back to the local cluster.

From what I could find, EMC had no documented procedure on how to do this. So after finding a blog entry and EMC Communities post (that individually did not help) here it is:

  • Attach a KVM to the failed appliance and reboot.
  • Hit F2 to boot into the system BIOS (the password emcbios).
  • Under USB settings, Enable Port 60/64 Emulation.
  • Save your settings and reboot the appliance.
  • This time hit Ctrl + G to enter the RAID BIOS.
  • Select the RAID 1 virtual drive and start a Fast Init.
  • Reboot the appliance.
  • Hit F2 to boot back into the system BIOS.
  • Under USB settings, Disable Port 60/64 Emulation.
  • Reboot the appliance and verify that no local OS is installed.
  • Insert the RecoverPoint install CD (the one you created after you downloaded the ISO from EMC Support and after you’ve burned it) and press enter to start the install.
  • The installation does not require any user interaction, your appliance will reboot when its competed into a “like new” status.
  • Rejoin the appliance to the cluster using procedures generated from Solve Desktop. (You can ignore instructions about rezoning fibre channel connections, or spoofing WWPNs, since none of this will have changed.)

The key points here are the bits about Port 60/64 Emulation. If you don’t do this, the RAID BIOS will load to a black screen and take you nowhere. Likewise, if you leave it enabled your RecoverPoint OS may not install correctly.

Cloned Away

Have you ever wanted to easily clone a virtual machine from a snapshot, and have the clone reflect the source as it existed at that point in time, as opposed to the current status of the source? Jonathan Medd (@jonathanmedd) has a great PowerCLI script that I found yesterday, to do exactly this.

Copy the contents of his script into a new .ps1 file, save it, and then execute the script within a PowerCLI window to add the function to your session. Then run the new function to create your clones. By default it uses the last snapshot in the chain, but you can request a snapshot by name as explained on his site.

New-VMFromSnapshot -SourceVM VM01 -CloneName "Clone01" -Cluster "Test Cluster" -Datastore "Datastore01"

Still Migrating

My second day transferring my iPhoto library to iCloud Photo Library seems to be going very well. The “optimize storage” feature on the iOS devices is going to save users a ton of space.

Yesterday when I posted my last entry I had a 16GB iPad completely full (which was roughly 7GB of photos.) When I returned, all the photos had been uploaded to iCloud, and returned 5GB of space. No matter what I throw at this (and I have about 19GB of images in iCloud now) the devices sit around 2GB utilized for photo storage.

When photos further back in the catalog that are not currently on the device are accessed, they’re retrieved from the cloud in full resolution.

I’m only about 1/5th the way through my library. I’ve been doing it in chunks as I have time, because during the upload process I tend to fully saturate my 5Mb upstream home connection.

If you’ve not turned on iCloud Photo Library yet, even if you don’t intend to do as I’m doing and dump everything into it, you’re really missing out.

Migrating Photos

When I saw the new iCloud Photo Sync demo at WWDC, I was in love.

Photo storage and syncing has been a struggle of mine for a while. I’ve bounced between external drives (which makes accessibility when I’m not at home difficult) and using local storage (which wastes expensive MacBook SSD space) … but never been happy. I’ve switched between Lightroom and Aperture for my “professional” images (AKA those taken when my Nikon DSLR) and mostly used iPhoto for my iPhone captured images.

The other issue was 16GB iOS devices fill up quick these days. So to save space, I would regularly sync my devices back to iPhoto and then delete the photos from my phone, but again, this made accessing older photos difficult when on the go.

With the convergence of getting better and better iPhone cameras that rival my 8 year old Nikon D200, and getting tired of paying for Adobe software updates, I eventually merged everything into iPhoto.

Now, with iOS 8.1, the iCloud Photo Sync beta rollout has begun, but only on iOS devices and via the iCloud website. The previously announced Mac app is slated for early 2015. But I want all my stuff in Apple’s cloud now, accessible on every device.

I figured out how:

  • Make sure you have iCloud Photo Sync enabled on your iOS devices.
  • Open iPhoto, open Finder > AirDrop on your Mac.
  • Open Photos on your iOS device.
  • Drag and drop photos from iPhoto to your iOS device of choice via AirDrop.
  • This triggers automatic sync to iCloud which starts dropping optimized versions all around the place.

I’m currently chugging back through May 1 of this year, which I only stopped there because that filled up my iPad with photos, and I want to see if after it uploads how it smashes the used space back down. I could keep going with my iPhone 6 that has another 40GB free, but this is enough experimentation for now.

I’ll also probably have to increase my 20GB iCloud plan to keep going beyond what’s in there now. Once I’ve got things moved off, I’ll be able to get my local copies moved back to external storage and then at some point once the Mac Photos app is released figure out how I want to deal with my local copies again.

I think my iPad will become central to future workflow for editing. I’ve long owned the camera connection kit, but never used it. Now it’s going to become the primary injection point of new images taken with the DSLR or editing ones taken with iPhone. (Especially now that Pixelator for iPad is here!).

Guidance Change

A few months ago I wrote about the VMware View optimization script breaking Internet Explorer and Adobe Acrobat through the addition of a registry entry that disabled Address Space Layout Randomization (ASLR):

ASLR was a feature added to Windows starting with Vista. It’s present in Linux and Mac OS X as well. For reasons unknown, the VMware scripts disable ASLR.
Internet Explorer will not run with ASLR turned off. After further testing, neither will Adobe Reader. Two programs that are major targets for security exploits, refuse to run with ASLR turned off.
The “problem” with ASLR in a virtual environment is that it makes transparent memory page sharing less efficient. How much less? That’s debatable and dependent on workload. It might gain a handful of extra virtual machines running on a host, and at the expense of a valuable security feature of the operating system.
For some reason, those who created the script at VMware have decided that they consider it best practice for it to be disabled.

At the VMware Partner Technical Advisory Board on EUC last month, I pointed this out to some VMware people and sent a link to the blog entry.

Over the weekend I got a tip from Thomas Brown from over at Varrow:

Today I had an opportunity to download the updated scripts (available here) and was very pleased to see:

 rem *** Removed due to issues with IE10, IE11 and Adobe Acrobat 03Jun2014 rem Disable Address space layout randomization rem reg ADD "HKLMSystemCurrentControlSetControlSession ManagerMemory Management" /v MoveImages /t REG_DWORD /d 0x0 /f

Success!

As always, please review the rest of the contents to make sure the changes that the script makes are approprate for your environment.

Jabber Persona

I just got finished with a customer issue who had deployed Cisco Jabber along with VMware View, using Persona Management and floating desktops set to refresh at logoff. Much to their annoyance, users would have to reconfigure their Cisco Jabber client with the server connection settings and any client customizations made were lost after logging back in to the desktops.

After looking into this, what it looked like was happening was that the Jabber configuration XML files were not being sync’d down to the local PC before the Jabber client was launching and this was causing the settings to default back to a non-setup state. Even though the configuration data stored in jabberLocalConfig.xml was saved to the Persona Management share it never had a chance to get loaded before it was overwritten.

The issue was resolved by adjusting Persona Management group policies to precache the settings stored on the persona share to the virtual desktop before completing login.

Modify the Persona Management GPO setting “Files and folders to preload” to include the following directory:

AppDataRoamingCiscoUnified CommunicationsJabberCSF

Server settings, custom adjustments to the client are now maintained across desktop sessions. WIN!

Guided Powers

Power… it is the only thing that you will find more prevalent in a datacenter than racks, yet many times when discussing upgrades and new installations it’s the part that no one ever mentions.

  • the IT team isn’t in charge of the power design (leased building, union, or separate electrical department)
  • have always just used 120v “normal” stuff under 1800 watts
  • aren’t an electric engineer/don’t understand what Amps, Volts, Watts are
  • don’t understand all of the options for connectors/cords

I’m guilty of these things, especially when I was just an administrator. Since becoming a consultant I’ve had to take a crash course (heh) in things like the differences between an C13 and an NEMA 5–15, 120 vs 208, etc.

Power always seems to be a major issue on projects these days, especially as more and more customers adopt blade systems like the Cisco UCS. What has really been difficult has been the latest generation of EMC VNX now requires 208v power on the Disk Processor Enclosure (there is a block only 5200 model that can run on 120, but you have to order it ahead of time, it doesn’t autoswitch by default anymore.)

Better understanding by customers is essential.