My Rubrik Experience

For those who don’t know Rubrik is an up and coming Cloud Data Management platform which essentially provides a converged, scale-out, clustered backup appliance for all of your Infrastructure backup needs. If you have been living under a rock for the last 3 years then please take a look at Rubrik.com :

Some other good reading on the product can be seen in the following blog post articles which explain in detail a lot more than I do in this post:

vBrain.Info

Penguin Punk

Recently I had the pleasure of this little beauty for a month for some testing:
Rubrik-2

Specifications:

R348 (Brik) – 1 Appliance
Nodes – 4
Disks – 4 SSD + 12 HDD (1 SSD / 3 HDD per node)
Memory: 256 GB (64 GB per node)
CPU: 4 * Intel 8-Core
Network: 4 * Dual-Port 10GbE & 4 * Dual Port 1GbE & 1 1GbE IPMI

Total Useable Capacity – 59.6 TB

Thoughts:

The reason I had my hands on this device was to test the functionality of Rubrik, pure and simple. I hooked it up to a 6 node vSphere 6.5 cluster running 10Tb of FC attached storage which covers around 100 virtual machines, ranging across Windows 10, 2008R2-2016 and Linux (RHEL6/7, Ubuntu, Centos). I had around a month of “playtime” with a fairly solid test plan to get through.

Simplicity: We had the appliance delivered ahead of time and the onsite engineer came a few days later after a simple rack and stack. Within 2 hours we had the cluster up and running (it would have been quicker if it wasn’t for our network blocking mDNS!). Beautiful, simple deployment.

Configuration: See simplicity! I’d already created a Rubrik Service account in my domain with the correct vCenter permissions. Adding my test cluster was a breeze and the VM discovery happened within minutes. I could have added all my machines to the built-in SLA Domain Protection policies and that would have me good to go, but I wanted to play in depth!

Useability: The navigation on the system is a beautiful HTML 5 interface that is really intuitive. If you haven’t seen it I suggest you take a look. Whilst we had an engineer present, everything was so simple to drive it felt natural and elegantly put together. One of the things I was really keen on, was replicating some archive VM data out to a cloud provider. It is fair to say that within about 10-15 minutes, before the engineer had time to get me a guide, I had configured an Archive target to a fresh Azure Blob store I had created. So easy.

Rubrik-3

Features: Coming from a legacy backup platform that isn’t very well geared towards a modern data center, I was blown away. Going from only having traditional agent based backups for Linux/Windows to having some awesome benefits such as:

– Snapshots
– Replication
– Archival (Local and Cloud)
– Live Mounts
– Google like File system search
– SQL DB Point in Time Recovery
– Physical O/S agent recovery
– Well documented API to consume

Was quite a big turn around in such a swift implementation.

Summary:

I understand the buzz of Rubrik and how they are a game changer in the Data Management, Backup/Recovery world. For modern data centers that are largely virtualised this is a product that really must be considered. Given the new 3.2 release where they provide the ability to backup your Cloud workloads using a Rubrik appliance, it really is starting to provide a well rounded and unique solution.

I would highly recommend any person looking into the backup space for their own benefit to make sure this vendor is reviewed!

VMware & Azure Site Recovery – Part 4: Failover and thoughts

In my final post in this series, I’m taking a look at performing a test fail-over of some VM’s I have on premises to an Azure Site Recovery instance.
This is a really easy process and doesn’t take too much documenting. Afterwards I’m going to share a few things I’ve learned along the way and my thoughts on ASR.

VM Fail-over

1) Navigate to the recovery vault from within your Azure Dashboard.

2) Under Protected Items Navigation pane and select “Replicated Items”.

3) Here you should get a view of all the machines that have been protected by you and their status.
ASR_Failover1

4) Select a VM that you want to test and click Test Fail-over.
ASR_Failover2

5) Choose the recovery point and the network that you want to fail-over to and select OK.
ASR_Failover3

6) Some checks should complete and the fail-over environment is prepared. After around 15 minutes I had my VM running and awaiting user input checks.
ASR_Failover4

As you can see, the fail-over process is easy for a single VM. I’m still in a testing phase at the moment so haven’t performed a mass failover but it doesn’t seem to take more than 10 minutes to move.

An issue is presented with testing, there is no “KVM” option of the VM – unlike other hosting providers. So the only way to check the VM had been put in the network and came up correctly was to build a management 2012 R2 server in my ASR recovery network with a public IP and enable RDP.

Lessons Learned:

1) There are some limitations on what is possible with your test fail-over VMs. Example being no KVM at present (only a screenshot view of the system).

2) All VM’s failed over receive a 20GB “free” disk for temporary working. A nice feature but for us it caused Linux VM’s to have their disk devices renamed. E.G – our ‘sdb’ disk became ‘sdc’ because the temporary disk took its label. Not ideal and I’m not sure what can be done to disable this at present. Microsoft recommend mounting disks by their UUID to work around this although this is contrary to Red Hat advice and mounting logical volumes.

3) If you have logical volumes which contain thinpool logical volumes, they are undetected by the Microsoft agent on the Linux VM. This isn’t great for our environment as we have these for DB snapshots, as an example.

4) Having a management server built and running to enable communications into your test fail-over network is a good idea!

Final thoughts:

I think that this service is very good and for Windows Servers very easy. Microsoft seem to be pushing their Azure offering hard and new features have been implemented in the portal by the time I managed to finish this blog series. There are limitations with the Linux offering and it is slightly more clunky, but that’s expected. The important thing to note is that it does work!

I believe that for a greenfield site where you can take into account a lot of the DR/Fail-over caveats and issues it is a great service. If you aren’t greenfield you might come across limitations that might be hard to overcome without having to re-architect a lot of your services (this is true in a lot of environments where DR is being built in as an afterthought). This is the issue I face at my workplace with this service.

For a small/medium size business, being able to replicate your infrastructure out into the cloud and pay a “minimal” fee to have DR capability is almost a no brainer. The simplicity of setup and the safety in knowing you can spin up in the cloud to maintain service is really excellent. It is almost certainly much cheaper than having to buy another DC/Room/Rack to put in a load of other kit and replicate too, especially for those with tighter purse strings.

Really understanding the service and what it will cost you is key, whilst replication costs and cloud storage are “minimal” – in the event of a fail-over you might find the bill coming in from Microsoft to be a much higher than normal. I guess that is the roll of the dice you take with having an opex “buy now pay later, should I need it” approach.

I would recommend ASR to people looking at a cloud DR service, obviously adding in caution and heavily advising to do your homework before taking the plunge.

VMware & Azure Site Recovery – Part 3: Replicating VM’s

This is the third instalment of my VMware and Azure Site Recovery series. In the last two posts I covered how to prepare for the service and installing the components within the cloud and private infrastructure to setup ready to make magic happen!

In this post, I’m going to go through client installation for a Windows and Linux box and setup replication jobs via the ASR interface online.

This process can be achieved in a number of ways: the Process Server can push out the client to servers you want to protect, you can install centrally (SCCM, GPO, Puppet, etc) or you can install manually. I decided to install it via the Process Server for the Windows server (this requires an account with local admin) and for the Linux server I installed manually as I didn’t trust the automated mechanism and the permissions model I have for Linux at work isn’t easy to have an account that isn’t root install things. (Easy life).

Windows Client:

1. Within the online ASR portal, via Site Recovery. Select a source of replication (vCenter Server and Process Server already configured).

ASRP3_1

2. For target, set the right accounts and then select the network that you want to fail-over into (as created previously).
ASRP3_2

3. Select a VM that you want to replicate from your inventory list. This is pulled directly from the vCenter Serve via the Process Server.
ASRP3_3

4. On the configuration properties, select the account that has permissions. In my case, the “AD VC PoC Account” is an account with vCenter permission and is also configured as local admin via GPO for this box. Purely for PoC purposes.

ASRP3_4

5. Select your replication policy for the VM. For me this was the default policy I created earlier.

ASRP3_5

6. Important! Before you proceed to step 7. Make sure the firewall on your server is set to allow traffic through from the process server or the next steps will fail! You need: Firewall & Print Sharing,

ASRP3_6

ASRP3_7

7. Last step is to enable replication.

ASRP3_8

8. Now the Process Server will contact the server and install the right components (agent) and enable replication.

ASRP3_9

9. After about 15 minutes, the installation should be complete! Replication should now start up.

ASRP3_9.5

Linux Client:

The Linux client requires two installs. One is the agent that talks/registers to the Process Server on site, the other is the replication agent.

1. On the process server get the appropriate Linux Agent file from: F:\Program Files (x86)\Microsoft Azure Site Recovery\home\svsystems\pushinstallsvc\repository and SCP it to the Linux box.

ASRP3_10

2. Navigate to your Process Server C:\ProgramData\Microsoft Azure Site Recovery\private. Open the “connection” file and copy the phrase down.

ASRP3_11

3. Unzip the installer and also create a “passphrase.txt” file and insert your passphrase.

ASR3_12

4. Install/Register the agent from the zip file:

ASRP3_13

5. Download the latest “WALinuxAgent” for replication to your Linux Box.

ASRP3_14

6. Unzip the file

ASRP3_15

7. Install and register the replication service:

ASRP3_16

8. Check that services are running:

ASRP3_17

After some private->cloud replication time (process server refresh) it should be possible to select the Linux VM from your inventory and replicate it, similar to how the Windows Server above was configured – minus the agent push.

I’m going to stop the post here and leave the replication/test fail-over for another post which should be the final one. The good thing about ASR is that once configured correctly it provide a “Test DR/Fail-over” option where you can run multiple simulations whilst maintaining replication!

Until next time!

VMware & Azure Site Recovery – Part 2: Infrastructure Configuration

Following on from my previous post about VMware and Microsoft ASR, I’m going to run through some more of the technical configuration which is required to get your VMware VM’s protected in the Microsoft Cloud. This section will mainly deal with setting up the on-site configuration server and the connectivity to the Azure Site Recovery Subscription.

Preparing Infrastructure

To create the link between cloud components and on-site you need to cover a Recovery Services Vault.

1. From your azure portal, under Monitoring + Management, select Backup & Site Recovery (OMS)
ASRP2_1

2. Create a Recovery Services Vault similar to above. I heavily recommend pinning this to your Dashboard to make it easily accessible later!
ASRP2_2

3. Once created, head to Site Recovery and Prepare Infrastructure. Select your goals (in this case, replicate to Azure from VMware).
ASRP2_3

4. The on-site configuration server and pre-req’s are required here (as mentioned in part 1). Download the installer and registration key to your server.
ASRP2_4

5. On your 2012 R2 server, run the installer.
ASRP2_5

6. Accept the EULA
ASRP2_6

7. Import the reg key as downloaded in step 4.
ASRP2_7

8. Select Proxy Server options.
ASRP2_8

9. Run the Prequisite checks (I had a warning of 500Gb secondary disk, not an issue as this is purely testing).
ASRP2_9

10. Enter the details of your MySQL passwords (interesting MySQL is used).
ASRP2_10

11. Agree to VMware virtual machine protection and validation of PowerCLI 6.0 takes place. It must be 6.0! I tried with the latest 6.5 and validation fails
ASRP2_11

12. Select your ASR install directory
ASRP2_12

13. Select the NIC on your box you want for replication traffic.
ASRP2_13

14. Hit Install!
ASRP2_14

15. Hopefully all goes well and you have some nice green ticks!
ASRP2_15

16. You will be given a passphrase for your configuration server. This is needed when you connect agents on protected VM’s to this server to be replicated. (It can be obtained later).
ASRP2_16

17. The Config Server admin opens for you automatically. Shortcut is also placed on Desktop. Enter in an account that has sufficient Administrator Privilege over your vCenter account.
ASRP2_17

18. Back in the portal you can add in the new source configuration server and AD account, select OK.
ASRP2_18

ASRP2_19

Note:- Sometimes changes on the config server are not available on the portal straight away. To fix this, you can find your Server from the pinned shortcut and perform a manual refresh!

19. Select your subscription, deployment model, storage account and network as a Target.
ASRP2_20

20. Create a default replication policy. I left mine as the defaults and came back and tweaked policies later.
ASRP2_21

21. Complete the infrastructure preparation by running the capacity planner and confirming. I have not done this as I’m only testing a few VM’s in the first instance.
ASRP2_22

This is a good place to stop here. The next post will detail adding some machines to be replicated, but in order to do that you need to either: install manually, push centrally or have the configuration server do it. Obviously the deployment method needs to be considered for your organisation (via GPO, DSC, Puppet, etc).

VMware & Azure Site Recovery – Part 1: Pre-Requisites

I’ve had the opportunity of investigating Disaster Recovery in my role recently. I have been looking at costs and methods of bringing our critical systems online in the event of a primary data center outage.

Without going into too much detail on my existing employer, there are many things to review and architecting DR into the existing infrastructure isn’t the easiest thing to do. Given our relationship with Microsoft, I was asked to investigate Azure Site Recovery to see if it was a viable option to provide us with a DR site in the cloud.

I’m going to be blogging in a small series on the technical implementation required to achieve VMware VM’s failing over from an on-site VMware cluster to an Azure Site Recovery instance. Hopefully if all goes well I’ll add to the series as I go, but for now I’m going to keep it simple with basic deployment.

Pre-Requisites

The entire process that I am following has been documented by Microsoft and gives some good detail on how to achieve VM replication into the cloud.

It is important to read through the checklist of required items before starting the setup. This can be done beforehand or during the actual implementation. I surmised it down to the following:

Cloud:

1) An Azure account, free trial possible (I have MSDN sub)
2) Azure Storage, somewhere to put your data.
3) Azure Network, VM’s location after fail over.

On-Site:

1) Build a new 2012 R2 Process/Management Server with necessary specification (Ready for installing ASR components)
2) External network connectivity to cloud services.
3) VMware vCenter + ESXi 5.5 or greater.
4) Guest machines that do not exceed certain limitations of the service (e.g. – No disks larger than 1TB)

Once the pre-steps are complete, it was on to configuring the magic….

1) Signed in to my MSDN subscription and setup the Azure Free trial ($150 a month)
ASR1

2) Login to https://portal.azure.com

3) Navigate to the market place, Networking, Virtual Network.
ASR2

4) If this is all new, it’s best to stick with the Resource Manager deployment model as that is the latest and greatest. Click Create.
ASR3

5) Create your virtual network by filling in your requirements. I went for the large default address space, naming it and then a small subnet within that for testing. In this instance I also created a new Resource Group for
ASR4

ASR5

NB:- A handy tip is to pin certain objects to the dashboard so you can see them on your main screen. I found this useful for the on-site Process/Management server.

6) Navigate to the market place, Storage, Storage account.
ASR6

7) Create a storage account as instructed. I opted for standard performance and keeping the data geographically local as I’m only performing a PoC. Maintaining the resource in the previously created group.
ASR7

8) Navigate to the market place, Monitoring + management, Backup and Site Recovery. Create a recovery vault to group resources together.
ASR8

ASR9

The next thing worth doing is creating an account in your Directory Services that VMware and ASR can use.

1) Create AD Service account
2) Add to vCenter/Datacenter object where VM’s will be replicated from.
3) Create a role for “Azure Site Recovery” and give it the following permissions:
ASR10

ASR11

Here is a good place to stop with this part of the guide. The hard part of this post was making sure all the pre-configuration bits are done and that you are ready to proceed.

In the next post I’ll run through configuring the actual Site Recovery and making the components communicate. The most notable comment I’d have for all of this is that Microsoft have gone quite a distance in making this process as easy as possible. That doesn’t mean, however, that it goes completely without technical caveats which I’ll cover later on in the series.
Until next time!