vCSA Automated Backup Failure
Recently we have gone through the process of upgrading our Windows 6.0 vCenter Server with external SQL to vCSA 6.5. I must say now how good the entire process was from start to finish, VMware have really done themselves proud on that tool. Our environment isn’t huge but it is big enough that we thought we might see problems – but no!
Part of the migration work was to get backups up and runnign as they were with our Windows vCenter (if not slightly different/better). My understanding is that the supported method for backup is to use the VAMI interface and run a full “file dump” backup of the vCSA with which you can restore into any blank deployed vCSA and you are back in the game. We have a Rubrik for snapshotting but using the VMware method is of course supported and preferred.
The Issue
Upon using the VMware provided Bash Script we encountered the following error in the backup.log file that is produced:
“{“type”:”com.vmware.vapi.std.errors.unauthenticated”,”value”:{“messages”:[{“args”:[],”default_message”:”Unable to authenticate user”,”id”:”vapi.security.authentication.invalid”}]}}”
Further investigation showed further errors in the VAPI endpoint log
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
tailf -f /var/log/vmware/vapi/endpoint/endpoint.log 2018-03-14T15:26:49.073 [2456]INFO:twisted:"127.0.0.1" - - [14/Mar/2018:15:26:49 +0000] "POST /api HTTP/1.1" 200 2783 "-" "vAPI http client" 2018-03-14T15:30:14.073 [2456]ERROR:vmware.appliance.vapi.auth:Could not parse HOK Token Traceback (most recent call last): File "/usr/lib/applmgmt/vapi/py/vmware/appliance/vapi/auth.py", line 183, in authenticate token.validate() File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 530, in validate reference = self.validate_signature(signing_chain) File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 763, in validate_signature 'Invalid SAML token: element has ' AuthenticationError: Invalid SAML token: element has invalid digest. 2018-03-14T15:30:14.073 [2456]INFO:twisted:"127.0.0.1" - - [14/Mar/2018:15:30:14 +0000] "POST /api HTTP/1.1" 200 339 "-" "vAPI http client" 2018-03-14T15:30:24.073 [2456]ERROR:vmware.appliance.vapi.auth:Could not parse HOK Token Traceback (most recent call last): File "/usr/lib/applmgmt/vapi/py/vmware/appliance/vapi/auth.py", line 183, in authenticate token.validate() File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 530, in validate reference = self.validate_signature(signing_chain) File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 763, in validate_signature 'Invalid SAML token: element has ' AuthenticationError: Invalid SAML token: element has invalid digest. 2018-03-14T15:30:24.073 [2456]INFO:twisted:"127.0.0.1" - - [14/Mar/2018:15:30:24 +0000] "POST /api HTTP/1.1" 200 339 "-" "vAPI http client" |
We could run a manual backup from the VAMI interface as the root user but just not using the bash script which is essentially using the VAMI API to curl a request to run a backup. The error above seems related to “authentication_sso.py” and being unable to validate the signing chain signature. Without further help there was no way I was going in to modify or look at that script on my own on a now Production vCSA.
I also created a seperate master user in the @vsphere.local domain to test running the backups but still had no luck.
I ran the script manually and the problem occured at the start of the POST to the appliances rest API.
The Fix
After speaking with several smart people in the vExpert slack channel, I raised a case with VMware support. I eventually received a response which told me to edit the following file:
1 |
/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py |
There is a value that needed changing from:
1 2 |
digest_value = self.xpath( '//ds:DigestValue', reference, expect=1)[0].text |
To the following:
1 |
digest_value = str(self.xpath('//ds:DigestValue', reference, expect=1)[0].text).replace('\r', '').replace('\n', '') |
Be careful with the amendment, there is space indentation on the code and there must be exactly 8 spaces in from the new line
Then a simple stop and start of the applmgmt service to apply the fix:
1 2 |
service-control --stop applmgmt service-control --start applmgmt |
Now the script runs perfectly daily to our backup respository. I believe this might become defunct in vSphere 6.7 as I think there is now a GUI way of scheduling backups!
Thanks a lot for this post! I had the same issue with several VCSA instances and now backup is running fine!
Hi Ryan,
Thanks for this information. In my Infra, I have vCenter 6.5 with 2 external PSC controllers. Could you please suggest if I need to modify this on all the nodes or just doing it on vCenter will fix it ?
Hi Ishu,
I’m honestly not entirely sure. However, the setting in this blog post is specific to applmgmt which is the vCSA application management interface (I think). The script interfaces with the vCenter directly, so I think you would only need to amend your vCSA and not the PSC’s.
Please don’t take my word for it, if in doubt raise a ticket to VMware support. Take backups before you start any work and make sure you have a back-out plan!
Best of luck,
Ryan
Hi Ryan,
Many thanks for your quick response. Much appreciated 🙂
Hi Ryan,
I have actually done this change on all 3 nodes i.e vCenter and external PSCs. However, instead of just modifying the line, I followed https://kb.vmware.com/s/article/67483, to replace the authentication_sso.py with the script that is attached in the article. This script, already has the amended line. However, I am still facing this issue. Please suggest.
Below is the line that I have copied from the script file that I have just uploaded on nodes.
digest_value = str(self.xpath(
‘//ds:DigestValue’, reference, expect=1)[0].text).replace(
‘\r’, ”).replace(‘\n’, ”)
Hi Ryan,
Now I am getting below error, which is different than previous one.
2019-11-22T22:59:47.326 [54113]ERROR:vmware.appliance.vapi.auth:Requested SSO authentication but SSO authentication module is not available.
2019-11-22T23:13:05.326 [54113]ERROR:vmware.appliance.vapi.auth:Requested SSO authentication but SSO authentication module is not available.
2019-11-22T23:13:58.326 [54113]ERROR:vmware.appliance.vapi.auth:Requested SSO authentication but SSO authentication module is not available.
Please suggest.
Hi,
I’m really not to sure why you are having this issue.
You have followed an article from VMware, which is great because ideally, you should follow VMware guidance and support (and not me!).
This fixed the issue for me, you might be different because your environment is not the same.
My advice is to raise this with VMware support. Perhaps in the meantime roll back to your snapshots/backups.
Ryan
Thanks Ryan. I have reverted back the changes though there were no issues. Now, I am getting again similar error that I was getting before, however there extra error on my vcenter which is related to certificates. I am sorry, i forgot to mention this before.
AuthenticationError: One or more certificates cannot be verified.
Looking at the “STS signing” tab under Administration-> Configuration, I see some certificate chain with servers that are no longer available in environment. Looks like PSCs were re-installed in past as we only have 2 PSCs, but STS signing shows 4 PSCs.
Thanks for the post. I had the exact same issue after upgrading vCenter 6.0 to VCSA 6.7 u3b.
I talked to VMware support and the fix above is still valid.
Hi Arne,
Thanks for the comment .Goodt to know the post is still useful. I’m sure in vCSA 6.7 the VAMI interface has a backup section built-in which allows you to create schedules.
I didn’t think there was a need to perform backup using the legacy scripts that used to be provided!
Cheers,
Ryan