Partner Board

I just flew back from Palo Alto where I’ve been attending the VMware Partner Technical Advisory Board on End User Computing. Prior to a couple of months ago, I’d never heard of a PTAB, and then I got invited to one.

The purpose of the PTAB is for VMware to invite services partners out to meet with VMware leadership to discuss the future of their products and to provide very candid feedback. There is also chance for training, as I was able to attend a free two-day ICM class on Horizon Mirage.

While the actual content of the PTAB is under NDA, I will say that VMware has some really exciting things happening in the EUC space. I had a great time, VMware puts on a nice show. The moment I realized I was sitting in the VMware headquarters eating bacon, I felt like royalty.

It was also a great chance to do some networking and meet people that I only previously had a chance to know on Twitter… including the vExpert Godfather, Mr. @jtroyer.

Last but not least thanks to people like @BChristian21 @DavesRant @rockygiglio @jaslanger @keithnorbie @thombrown @earlg3 and the others who let me tag along in the evening and share sushi, milkshakes, chowder and beer. Every night I felt like I was drenched in knowledge just being able to listen and share feedback between the group.

If I ever needed inspiration to start evaluating my professional future and to go for my VCDX, this was the group to do it.

I’m looking forward to another oppertunity to attend in the future.

Guided Powers

Power… it is the only thing that you will find more prevalent in a datacenter than racks, yet many times when discussing upgrades and new installations it’s the part that no one ever mentions.

  • the IT team isn’t in charge of the power design (leased building, union, or separate electrical department)
  • have always just used 120v “normal” stuff under 1800 watts
  • aren’t an electric engineer/don’t understand what Amps, Volts, Watts are
  • don’t understand all of the options for connectors/cords

I’m guilty of these things, especially when I was just an administrator. Since becoming a consultant I’ve had to take a crash course (heh) in things like the differences between an C13 and an NEMA 5–15, 120 vs 208, etc.

Power always seems to be a major issue on projects these days, especially as more and more customers adopt blade systems like the Cisco UCS. What has really been difficult has been the latest generation of EMC VNX now requires 208v power on the Disk Processor Enclosure (there is a block only 5200 model that can run on 120, but you have to order it ahead of time, it doesn’t autoswitch by default anymore.)

Better understanding by customers is essential.

VNXe Thoughts

I had some thoughts after reading Chad Sakac’s blog entry about the new VNXe 3200.

  • The original VNXe (3100/3150/3300) was not my favorite product. It was fine as far as entry-level storage goes, but there were a good chunk of restrictions on the product, both technical and artificial, compared to it’s “big brother” VNX.
  • I’m conversely more excited about getting into deployments of the VNXe 3200. I’ll let you read Chad’s blog to get a more complete list of features but being able to do FAST Cache and FAST VP makes it a lot more of a compelling product.
  • I get the impression from reading Chad’s post that VNXe is reaching the point where the platform will eventually gain the ability to be as feature complete as the VNX and being built on the same hardware platform eventually perform as well as the VNX.
  • At some point, I would expect the “next-next-Generation VNX” to look more like a VNXe then the CLARiiON/Celerra mashup that exists today. No Windows code anywhere to be found, truely unified block and file setup.
  • If all they did was get rid of Java in the full VNX Unisphere mangement interface, I’d be so happy.
  • I suspect a lot of customers where a block-only VNX 5200/5300 made sense are going to be “moving down” to the VNXe.

Looking forward to getting my hands on one.

Broken Security

I’ve been using the Windows Optimization Guide for View Desktops guide on the VMware website for a long time. Hidden inside the PDF are some text file attachments that when converted to .bat, run though and disable most of the functions that bloat virtual desktop linked clones or are totally uncessary when accessed from a thin client or mobile device. However around October of last year during a customer engagement I noticed the PDF was updated with a revised version. That version has caused me a lot of headaches.

After running the revised scripts, I was basically left with broken templates. Internet Explorer would no longer load. Breaking Internet Explorer sort of makes me look like an idiot after I deploy entire pools of desktops and companies can’t use them to run their corporate webapps.

I’d never got around to figuring out exactly what caused this issue, and because of it I’d been using a modified version of an older script during my engagements. However during a View implementation this week I was unable to find this older copy and so I decided I was going to figure out what made this new script such a pain.

ASLR

Address space layout randomization (ASLR) is a computer security technique involved in protection from buffer overflow attacks. In order to prevent an attacker from reliably jumping to a particular exploited function in memory (for example), ASLR involves randomly arranging the positions of key data areas of a program, including the base of the executable and the positions of the stack, heap, and libraries, in a process’s address space. (Wikipedia)

ASLR was a feature added to Windows starting with Vista. It’s present in Linux and Mac OS X as well. For reasons unknown, the VMware scripts disable ASLR. Specifically, it’s done by this registry entry command:

reg ADD "HKLMSystemCurrentControlSetControlSession ManagerMemory Management" /v MoveImages /t REG_DWORD /d 0x0 /f

Internet Explorer will not run with ASLR turned off. After further testing, neither will Adobe Reader. Two programs that are major targets for security exploits, refuse to run with ASLR turned off.

The “problem” with ASLR in a virtual environment is that it makes transparent memory page sharing less efficient. How much less? That’s debatable and dependent on workload. It might gain a handful of extra virtual machines running on a host, and at the expense of a valuable security feature of the operating system.

For some reason, those who created the script at VMware have decided that they consider it best practice for it to be disabled.

Or do they?

I actually can’t find anywhere else in the document that says that ASLR should be disabled. Even in the table that lists all the changes that are done by the script, it’s not listed, yet under the “changes since last version” the command referenced above is listed. I also can’t find anything else on VMware’s site that says it should be disabled. Actually, I found information to the contrary.

Back in 2011, a VMware blog entry by Eric Horschman specifically called out this issue and clarified that it is not recommended to disable ASLR in a general sense.

The same is true from André Leibovici (previously an Architect in the Office of the CTO End User Computing at VMware, now with Nutanix, and someone I consider to be a virtual desktop expert) who on his site myvirtualcloud.net back in 2011 had this to say about ASLR, specifically in VDI:

Is it a good practice to disable ASLR? The short answer is No. Unless you are pushing very high levels of memory overcommit in a 32-bit desktop VDI environment, you have a lot more to lose than to gain from disabling ASLR. On 64-bit platforms the loss of opportunities to share pages is much less due to the large memory page nature.

So how did this get added to the standard optimization script? Given VMware’s public position that runs contrary to this, I assume it’s there by mistake. I actually notified VMware about the fact that the script was breaking Internet Explorer back in October but it apparently had never been isolated, or possibly never investigated.

(The revised scripts also previously contained a bunch of incorrect ‘ and “ characters in it, that also caused running most of the commands in it to fail. This was corrected.)

Sadly, the reason why Eric and Andre even brought this up in 2011 was because of Microsoft. In a couple of Microsoft blog entries (1/2) they started spreading some FUD by attempting to say that VMware was suggesting that customers disable ASLR.

The reality was it was an topic was addressed to say that yes, you can increase consolidation ratios by turing off ASLR, but at the expense of security. There was a bit of back and forth from some of the VMware folks suggesting that Microsoft’s implementation of ASLR isn’t even all that effective at mitigating malware infections. I won’t get into that.

Regardless, it’s a security feature of the operating system, and in the case of the applications referenced above, one that totally breaks functionality. Hopefully, VMware will correct this soon. In the mean time, I’ll be commenting on this line on all future engagements.

Great Success

There is a bug in Windows Server 2012 R2 in the volume license activation wizard, that if you don’t change the Key Management Service port setting when applying the configuration (from “0” to whatever you want it to be, such as the default of 1688) you get this absolutelty most unhelpful success/error message.

The following error has occurred. Please resolve the error and try again. Description: STATUS_SUCCESS

NIC Life

Life is rough for a ESXi network card these days, both pNIC and vNIC. It’s especially bad if you’re using E1000/E1000e adapters in your VMs, or using Broadcom network cards, or a combination of both.

And considering Broadcom cards are the built in pNIC adapters for nearly every piece of server hardware, and the E1000 driver is the default Windows Server vNIC adapter in VMware: these are two incredibly common things to have happen, so what environment isn’t using a combination of both?

On the physical NIC side, VMware has identified an issue with the tg3 drivers in use since ESX 3.5 that can cause data corruption.

The options for resolution there are to upgrade the Broadcom driver on your hosts, or disable TCP Segmentation Offload on your cards.

On the virtual NIC side, VMware has identified an issue with the E1000 adapter that causes the purple screen of death on hosts with virtual machines using this adapter on anything running ESXi 5.0, 5.1 or 5.5.

Options for resolution are to convert virtual machines to another driver such as VMXNET3 or disable Receive Side Scaling inside the guest operating system.

For ESXi 5.1 hosts, Update 2 has been identified as having a fix for this issue, but doing so may introduce its own set of issues.

Again, the workaround is to use something like the VMXNET3 adapter in your virtual machines. You can also install patch ESXi510–201402001 after installing Update 2 to fix the memory leak that causes the second issue.

Unless you can’t do so for an incompatibility reason I would suggest using VMXNET3 as your default vNIC adapter as best practice. If you have the ability to isolate E1000 virtual machines to a host or subset of hosts within your cluster to prevent a crash from effecting other systems, I would also do this.

FQDN’ed Up

I ran across this little interesting tidbit in an EMC Support article that I wasn’t aware of previously. Using the fully qualified domain name of the EMC Isilon SMB server for file sharing on is necessary for proper load balancing and access:

Always use the fully qualified domain name (FQDN) of a SmartConnect zone when accessing the cluster. If you attempt to use the short name, Windows hosts will attempt to use the NetBios name service (NBNS) to resolve the connection. Because NBNS uses broadcast pings on the network to determine what IP a host is located at, the Windows client will connect to the first node to respond, which might result in client connections not being evenly distributed across the cluster. Additionally, by using the NBNS services, you do not utilize Kerberos for authentication and authorization, and are required to use NTLM (NT LAN Manager) based services, which can lead to permission denied errors.

For the non-Isilon initiated, a SmartConnect Zone is how Isilon does load balancing across various nodes in the cluster. It’s configured as a delegation zone in your DNS, that replies back with a different IP address coorisponding to a physical NIC on an Isilon node. Depending on licensing it can be configured to reply based on basic round robin, or by connection count, CPU or network utilization metrics. It’s important that it functions correctly as not to potentally overload an individual network port and therefore an individual node as an entry point into the cluster when accessing data.

The EMC Support article where it’s referenced (emc14003900) is centered around integrating SMB on Isilon with DFS, but I would think the principles are the same for normal user/server UNC addressing.

Even if it’s not, I’d still consider it best practice to use the FQDN.

Advanced Administrator

On Friday, February 7, I sat for the VMware Certified Advanced Professional, Data Center Administration (VCAP5-DCA) exam. Thinking about how I performed has consumed most of my idle hours, so after some reflection over the last week I’ve decided to document a bit of my perspective. I’ll say as much as I can without breaking NDA. I can’t imagine anything listed here isn’t something covered in the official exam blueprint or any of the numerous articles or training for the exam.

I actually thought the test was a lot of fun. For the uninitiated, the test is unlike any other exam in the VMware portfolio, and unlike any other exam I’ve taken for any other certification. It is 100% lab based. You have remote access to a VMware vSphere 5.0 environment, with a vCenter, two hosts, a collection of virtual machines, and pre-provisoned EMC storage.

In other VMware exams, you’re given 60–70 multiple choice questions to regurgitate anwsers to. In the VCAP, you are given 26 different “projects” you have work your way though. I say projects because each of the 26 will vary in length and have multiple component problems to solve. Some may be straight foward, some far less so. For example, one question might be something to the effect of:

Create a Distributed Switch called LabDSwitch and a port group called LabPortGroup that has two uplinks, then assign hosts 1 and 2 to this Distributed Switch.

There would generally be more to it then that, but basically, you’re given a roadmap of what to do, what the examiners are looking for is that you know where to go and what processes to follow to do the task so that all of the network connectivity to your environment isn’t lost. More on that later.

Something that might be less intutitive may be a problem that states a specific virtual machine is not performing as expected, and directs you to investigate why that would be the case. You’re given a target but very little direction from the question as to what to look for or change. You’re expected to draw on your own knowledge of VMware best practices and real word experience to correct the issue.

In many cases, the questions are a mix of both. It’s a series of complex and interconnected word problems. You’re told to do something direct, but with an occasional hint dropped that you may need to read more into what they’re saying to be succesful and achieve all the points for that project.

My advice for the future candidates would be to do as much as possible within each section that lets you move on to the next piece, note what you may have missed, and then come back when you have more time (or possibly must complete it to finish other sections of the exam.)

There were a couple of sections where I did struggle, especially for things like Auto Deploy where I’ve never used it in a production setting so I had very little to draw from. Everything on the blueprint though is fair game and I think nearly all of vSphere got touched in some way during my exam.

The exam itself is 3.5 hours. Normally I test at the Pearson Vue testing center at Johnson County Community College because it’s close to my house. I’ve done enough certification exams in the last three years of being a consultant that I’ve come to know the ladies who proctor the exams at this site pretty well. (Actually after my CCNA-DC exam last month I stood around and chatted with one of them about her son’s upcoming driving test and then a bit about lawn care for over an hour.)

However, the VCAP is what Pearson considers a “Professional” exam, so it must be done at one of their more low-key and higher security sites. Scheduling the exam gives a lot fewer options than your normal tests do. The number of days and timeslots are few and far between compared to a relaitive free for all of 15 minute increments on the normal exams. Arriving at the testing center, the people are friendly but it’s all business. While sitting in the waiting room before I was even checked in, I was chastized for checking my iPhone for just a few seconds. Apparently, once you enter their facility, just pretend you’re waiting to be interviewed by members the FBI and be on your best behavior.

After running through the process of getting checked in, it was time to test. As soon as you start, you’re given a quick survey from VMware about your perceived knowledge about their technologies. I’m not sure it has any bearing on the difficulty of the question you receive in the test, but I doubt it. The survey is off the clock, but as soon as you submit that, the 3.5 hour timer starts. There is some information about the test that you could waste time reading, and I started to until I realized it was all pretty much knowledge gained through training. Looking at the clock and I’d lost three minutes already. Time to get cracking.

You will alternate between windows that show your task lists for each section, and an RDP session that gives you access to your lab environment. You have a vSphere Client, access to the Microsoft RDP application, Putty and Adobe Reader. Opening Adobe Reader will get you access to any of the VMware documentation PDFs that you’d want to reference during the exam. You also have access to command line utilities like PowerCLI and the vMA running within various virtual machines.

I would limit your time looking through the PDF files, unless you’re looking for a specific command or advanced option. They are there as a reference, and you really have to know what you’re looking for to get anything from them. There is simply no time to waste browsing.

Now, I’m a animated person. If I’m engaged in a project, or a complex troubleshooting session, I’m usually moving around a lot. I might be hitting the whiteboards, walking around the room or down the hall, thinking, grabbing a drink, even talking to myself to walk through steps I’d take to implement a solution. Doing any of that here will get you disqualified and kicked out. This was perhaps the hardest thing for me to do for nearly 4 hours. Sit still, be quiet.

In hindsight as a result of that, I’d wear more comfortable clothes if I had to do this again. Not that my work clothes aren’t generally comfortable, but they’re not the most comfortable things I own.

Depending on the network connection from the testing center back to the environment of the lab, you may experience some latency. It was not a factor in my ability to complete the exam, but it was frustrating at times waiting for the screen to redraw if I asked too much of it at once. However, I’ve heard stories from others who have taken this exam outside of the United States where the experience was unbearable. The less time you spend trying to flip back and forth between the questions and the remote session the better off you’ll be.

Also remember that everything you do in the lab can potentially impact your ability to complete further problems. If you reboot your vCenter VM, or detach it’s network card, or do something that causes your hosts to become unresponsive, you either have to fix it or possibly end the exam right there.

I did have an issue where the function keys on the keyboard wouldn’t pass through the RDP session into the VMware console, making my ability to use say F6 impossible. If my score is such that I failed by one point, I’m going to argue on this point, but for now I’m not worried.

In terms of training for the exam, I relied heavily on Jason Nash’s video training at Pluralsight (previously Trainsignal.) Being a vExpert has some perks, and one of them is a free year of access to their video library. They have a lot of great virtualization and data center related topics and it’s well worth the cost, even if you subscribe for just a month, if you can’t get access for free. I also reviewed the “Unofficial VCAP-DCA Guide” by Jason Langer and Josh Coen. It’s available for free through a sponsorship by Veeam.

Overall, if you’re a VMware consultant who gets to play with the vSphere product on a regular basis for implementation and troubleshooting, there shouldn’t be too much that is so difficult you want to cry. However, I could see where your regular everyday system administrator would struggle unless they’re in environments where Enterprise Plus licensing is in place and they’re taking advantage of all the features they can. Even then it would be tough. That said, it’s probably the case that anyone who is considering going to the VCAP level is probably one of those two things already. SMB administrators probably have a hard enough time getting the expense of the required VCP training paid, and are probably pretty well served by the level of knowledge obtained by it if they obtain it.

Unlike most every other certification exam, when you hit submit on the final problem of the VCAP, instead of the familar “Congratulations” or “Sorry” — you’re told you will need to wait up to 15 business days for your results while they’re manually tabulated by VMware. My thinking is that it will probably be at the extreme of that timeline or possibly longer considering VMware Partner Exchange is going on and a lot of people are testing this week. Although it could mean more resources dedicated to grading, and I would be at the front of the line.

Either way, it’s now just a matter of waiting to see how I did. Out of 500 points a passing score requires at least a 300. I went in with the expectation of needing to run through the test once for the practice, and then taking it again to pass. I won’t be disappointed if I don’t, but I feel confident enough that I won’t be surprised if I do. The day I get my results, if they’re not positive I’ll be back on the Pearson website scheduling my next exam date.

Update: I passed!

Upgrade Lottery

Over the weekend I facilitated a customer upgrade that involved:

  • In place upgrade of Windows Server 2008 to Windows Server 2008 R2 on a vCenter Server.
  • Direct upgrade from View Composer 2.6 to View Composer 5.3.
  • Direct upgrade from VMware View 4.6 to Horizon View 5.3 on two connection brokers.
  • Direct upgrade from vCenter 4.1 to vCenter 5.5.
  • Direct upgrade from ESXi 4.1 to ESXi 5.5 on multiple systems.

All of these, on a Saturday, with no issues. No calls to VMware support. No reviewing error logs. Very little hand wringing. For the most part everything went according to plan.

I feel like I should buy a Powerball ticket this week, or maybe make a trip to the casino.

Objective Complete

The Cisco Data Center track has been around since November 2012, and when they announced it I knew that I’d have to get it at some point. I’m pleased to say that it’s now done, and I can start making my way to other things… like a CCNP Data Center.

And my VCAP-DCA.

My goal (and my employers) for 2013 was to finally get my Cisco Certified Network Associate (CCNA) done, and I completed the first exam (ICND1) and received my Cisco Certfied Entry Network Technician (CCENT) back in July. The CCNA was something I’ve wanted to do since I got into IT. However I got side tracked by other things and never completed the second test.

I did however complete the EMC Implementation Engineer certfification for Isilon, and passed the VMware Certified Associate in Cloud exam. So 2013 wasn’t a total loss.

Sometime in late December after evaluating my status on the R/S CCNA exam, I decided to just bypass it and go straight into the Data Center specific version. Over my two week winter vacation I crammed for both exams and tested for both of them this week.

A few thoughts about each exam:

  • 640–911: This exam was very similar to the CCENT exam, covering the basics of networking however with less of an emphasis on subnetting (may have been one or two easy questions on the test vs a half dozen brain crunchers on the R/S version) — you are expected to do hex to binary to decimal and back, but that’s about it. There is a very Nexus flavor to this but nothing too heavy.
  • 640–916: I stressed over this exam but in the end found it easier than the first. It’s basically a knowledge test of the Nexus, MDS and UCS product lines. Not deeply technical, but enough that you have to know the products. The simulator portion was almost too easy compared to what I’d have expected from a Cisco exam.

Either way, it’s done!

So, for 2014, the goal is VCAP-DCA. No excuses. I’m also thinking a lot about exploring the Cisco Data Center track and going for my CCNP. I need to get more hands on expertise and a few UCS B-series deployments under my belt first. Between these two I will probably be very busy, and I’m sure work will require something else on top of those. It seems like there is always another EMC product that I’m having to catch up with.

Never stop learning.