VMworld 2018 EU – How many vExperts can you fit in a photobooth?

It’s been a while since I have posted, for a number of reasons really. The main excuse is that I haven’t been doing much with VMware for my job in the last few months and I don’t want to post anything that I didn’t feel relevant. I’ve been having time with family too, which is the most important thing.

I’m back at VMworld 2018 EU this year after entering a competition on Twitter and winning. Completely unexpected this year and a bit last minute, I love the power of the community although it was a bit of a mad rush to get everything locked in.

Last night I had some great fun at the Rubrik party, met many old friends and some new. A highlight of the evening was “how many of the London VMUG could we fit inside the photobooth”. It turns out not too many, but we managed around 5 or 6.

Disclaimer:- The quality of the pictures are down to poor light and a bad front facing camera. They by no means represent or infer the inebriation levels of participants.

Feeling fresh and ready for the keynote this morning, we headed down and I was lucky enough to sit with some good friends and in fact housemates for the week @GarethEdwards86 @AndyNash99 @Dark_KnightUK in the blogger area with great views for the keynote.

I’m not going into all detail of announcements and VMware progress because others will certainly cover it (I did make extensive notes I promise, but on a whim have gone a bit rogue with this post). However, some top things:

– VMware 20th Birthday!
– VMware green initiative. Saving 540 Million tonnes of CO2 emissions, this equals Italy, Spain, Switzerland and the UKs output!
– VMware is now a carbon neutral company. Smashing the targets of 2020.
– “Possible belongs with you” using Tech as a for good.
– Daymn that NSX HTML 5 interface looks swanky!
– Heptio acquisition. Sharing a common mission of delivering “dial tone” kubernetes.

The things that resonated most with me today were Pat talking about Green initiatives and using Tech for good. The case study of Mercy Ships; a company that is trying to change a global surgery crisis, operating the worlds largest floating hospital that provides free care to countries in desperate need around Africa. Having remote infrastructure that is stable, in the form of VxRail means they can provide services without having to worry about managing tech and focus on providing healthcare.

Today has been a great day and in fact I’m confident it’s going to be a fantastic week. I’ve met lots of old friends and fellow @LonVMUG members. The spirit of community is strong and it continues to grow every VMworld that I attend.

A great example for me today was stepping up to assist @Ericnipro who came to the community area to ask for help for a French chap who was struggling with a VMware/Veeam issue. With the invaluable aid of @mwpreston, we helped the guy as best we could with a workaround script that we ended up stealing from Reddit. Then I chatted and caught up with Mike to which he gives me a jar of his homemade maple syrup!! Totally made my day, thank mate!

It hit me this week that I’ve actually made some impact in the community albeit small (although it doesn’t always feel that way) just by meeting people and spreading the word of the programme. A few people have attributed me to them now being vExperts which makes me feel good about myself and the community.

Too many people to mention them all, you know who you are! Too many people I haven’t yet properly met, let’s do this!

vCSA Automated Backup Failure

Recently we have gone through the process of upgrading our Windows 6.0 vCenter Server with external SQL to vCSA 6.5. I must say now how good the entire process was from start to finish, VMware have really done themselves proud on that tool. Our environment isn’t huge but it is big enough that we thought we might see problems – but no!

Part of the migration work was to get backups up and runnign as they were with our Windows vCenter (if not slightly different/better). My understanding is that the supported method for backup is to use the VAMI interface and run a full “file dump” backup of the vCSA with which you can restore into any blank deployed vCSA and you are back in the game. We have a Rubrik for snapshotting but using the VMware method is of course supported and preferred.

The Issue

Upon using the VMware provided Bash Script we encountered the following error in the backup.log file that is produced:

“{“type”:”com.vmware.vapi.std.errors.unauthenticated”,”value”:{“messages”:[{“args”:[],”default_message”:”Unable to authenticate user”,”id”:”vapi.security.authentication.invalid”}]}}”

Further investigation showed further errors in the VAPI endpoint log

We could run a manual backup from the VAMI interface as the root user but just not using the bash script which is essentially using the VAMI API to curl a request to run a backup. The error above seems related to “authentication_sso.py” and being unable to validate the signing chain signature. Without further help there was no way I was going in to modify or look at that script on my own on a now Production vCSA.

I also created a seperate master user in the @vsphere.local domain to test running the backups but still had no luck.

I ran the script manually and the problem occured at the start of the POST to the appliances rest API.

The Fix

After speaking with several smart people in the vExpert slack channel, I raised a case with VMware support. I eventually received a response which told me to edit the following file:

There is a value that needed changing from:

To the following:

Be careful with the amendment, there is space indentation on the code and there must be exactly 8 spaces in from the new line

Then a simple stop and start of the applmgmt service to apply the fix:

Now the script runs perfectly daily to our backup respository. I believe this might become defunct in vSphere 6.7 as I think there is now a GUI way of scheduling backups!

vCSA 6.5 High Availability Configuration Error

Recently I have been experimenting with configuring the built-in vCSA 6.5 HA functionality. Upon reading the documentation found here. I set about the task of configuring a basic HA deployment.

The error I saw upon completing the wizard was:

“A general system error occured: Failed to run pre-setup”.

Unfortunately, there wasn’t much to go on in the vCenter logs via the web GUI so it was time to SSH into the vCSA and go digging around for some logs with a little more information. After a brief meander, I found the following log

The interesting contents of the log were spat out as follows:

Looking at the log, it seemed that insufficient privileges were given to the user trying to create vcha user (root!). I then remembered the recent issues that VMware have had with Photon and root passwords expiring after 365 days. I logged into the VAMI for the vCSA and tried to reset the password but I was given an error.

The fix, in this case, was to simply reset the root password of the user via the bash shell.

At this point I was able to login with the new password and then login to the VAMI and set the root password to never expire. You can also do it via the command line using the “chage” command on the root user.

After restarting the deployment the pre-checks ran successfully and the configuration continued!

Hopefully this might help someone who is trying to do something similar!

Migrating ESXi Management VMkernel

I have been doing a fair amount of work with NSX recently. In order to start this work we have had some environment changes to go through before achieving this. One of the changes we had to make was to the network that contains the VMkernel for host management traffic. The overall aim was to migrate the interfaces to a new management VLAN (new subnet, gateway, etc).

Here is how I managed to do it without disruption to any existing management or services running.

1) The first step was to create a portgroup on my vDS for the new Management VLAN that had been trunked to the hosts.

I would advise to configure the port group further for your environment based on VMware Network Best Practices for things like Traffic Shaping, Teaming/Failover, etc.

2) Now the port group exists, add in a new VMkernel for all of your hosts for management traffic. For me, I ended up with 3 vmks: old management (vmk0), vMotion (vmk1) and new management (vmk2).

3) From here, I put hosts into maintenance mode that I was going to reconfigure, just to be on the safe side.

4) At this point, it isn’t possible to remove the existing vmk0 because it is in use. The reason for this, is the hosts TCP/IP stack configuration has the old VMkernel gateway configured. This should be changed to the new management network gateway address on each host:

5) From here, I disconnected the hosts from vCenter.

6) I then changed the host records of my ESXi servers to the new management IP address. Allowed some propagation (in fact I checked from the vCSA appliance that it had picked up the newest record from my DNS servers).

7) Reconnect the host(s) back into vCenter.

8) It is now possible to remove the old management VMkernel adapter (vmk0 in my case).

9) I did follow through the process of rebooting my hosts before exiting maintenance mode, but I do not actually think it matters too much.

There we go! A fairly straight forward process and one that I can’t imagine many people doing. I did have a look to see if anyone else had performed a similar process but they hadn’t moved subnet and gateway. Hopefully this might help someone out there who wants to do this!

Rubrik – PowerShell/API SLA Backup & Restores

Having been lucky enough to procure a Rubrik Cloud Data Management appliance at my work recently; we have had the pleasure of experiencing a fantastic technical solution which has assisted us in improving our backup/recover and business continuity planning. The solution, for us, is still in its infancy but we hope to scale and grow as the business realises the full potential of the service. Until then, we have had fun in preparing it for our own production use as it is such a joy to work with!

One thing we questioned was how we get a list of our SLA Domains (as we’ve made a fair few) and their contents. This could be useful in the scenario of someone accidentally deleting policies or machines out of policies. Another potential use case could be if we needed to ‘rebuild’ our Brik SLA configuration in the event of a major failure – highly unlikely but better to be prepared and have committed some brain cycles to it, right?

With that in mind, my esteemed colleague @LordMiddlewick has written some PowerShell scripts with the help of @joshuastenhouse previous blog posts about using Rubrik RestAPIs .

Backup Script

This script can be scheduled to run at your own convenience. Ensure that you fill in the variables in the top section for your own environment. It is possible to encrypt the password within the file itself, this can be achieved using a methodology described here. We have only encrypted it for transmission to the Rubrik service in the case below for simplicity.

The key take aways from this script in whatever fashion you run it are:

* You receive a bunch of .txt files, for each SLA you have defined, in JSON format. Useful for restoring SLA’s. Here is an example:

* Another take home is the file “VM-SLA.csv” which contains a list of all your VMs that are backed up and to what policy they belong. This is really useful restoring VM’s into SLA’s or bulk importing VM’s into SLA’s.

Restore SLA Domain Policies

To reverse the backup process and restore an SLA or all of your SLA’s into the Rubrik, use the following script:

This script will take any .txt (SLA Backup files) in the designated $path and try and create it back on your Rubrik.

Restore/Import VMs to SLAs

The final part of this excercise is to be able to restore a list of VMs that have been pulled out, against the SLA domain policies that you have. The following script does this by using the above “VM-SLA.csv” to import a list of objects in and assign them as per the csv.

The format for the VM-SLA.csv file is as follows:

In theory, if you have lots of machines you want to bulk assign to any given policies, you can create your own CSV and run it to import your VM estate to your predefined policies using this script. We used this several times when assigning 100+ objects to a given policy and it worked a treat!

Disclaimer:Please try to fully read and understand the above scripts before implementing them. You should test them fully first in a development environment before implementing in any production sense. I/we do not take any responsibility for rouge administrators stupidity.

I’m sure as Rubrik continue to steam ahead with excellent releases, infact they might evne build in some of this functioanlity making these scripts redundant. In the meantime hopefully someone finds these scripts useful, I know we have. Once again big shout out to @LordMiddlewick for writing this and giving me permission to post it and also to @joshuastenhouse for his blog https://virtuallysober.com .