VCAP5-DCD Design Practice

February 12, 2013 by Gregg Robertson 14 Comments

As some people may know I am currently preparing to re-take my VCAP5-DCD and I have reached the point in my preparations now where I am doing mock designs and also going through the labs from the VMware Design Workshop and so I thought I would follow the same idea and start creating a mock customer design scenario and also put down the same vein of questions I am being asked from the design workshop labs and hopefully if people are interested they can use it, write down what design choices,the justifications for these choices and the impacts these choices create on the rest of the design and hopefully everyone will learn from this. Below is a company profile that I made up and I also used some ideas from a scenario Matt Mould one of my Xtravirt colleagues sent me as few months back:

Company Profile
•    Safe & Legit, are a global trading company – they specialise in ground defence equipment
•    13,000 physical servers across 9 sites.
o    6k UK (3 sites)
o    2k CN (3 sites)
o    5k US (3 sites)
•    There are two level 4 DC’s per country (for info on DC levels see http://en.wikipedia.org/wiki/Data_center
•    DC’s are linked by an MPLS cloud from BT, Verizon, Colt and NTT (contracts end Q1 2015)
•    One DC per country is privately owned and Safe & Legit want to retain the real estate, but make room to lease out sought after level 4 private suites, thus providing a new revenue stream, and hopefully make their own DC’s cost neutral in doing so. Therefore they are looking to virtualise as much of their physical estate as possible into vSphere 5.0
•    The remaining DC’s are rented from BT, Verizon and NTT (contracts end Q1 2015) . The CFO has voiced his desire to cut the cost of these rentals and would ideally like to not have to renew the contracts if possible.
•    ERP is centralised in the UK
•    Each country has locally hosted Print, Domain, UC & Messaging
•    Collaboration is centralised, again in the UK
•    Typical/normal file sharing is not permitted, all ‘matter’ is recorded and audited in Safe & Legit’s collaboration system
•    With the exception of ERP, all systems must move to a shared or distributed model. This is following a series of natural disasters in the US and China, that could have been avoided by having a DR and BC plan in place.
•    All communication end points are encrypted, but new legislation is relaxing where encryption is required. This is achievable following an ERP upgrade that separates out sensitive and non-sensitive data.
•    There are up to 5,000 3rd party users, that own a license to trade under Safe& Legit LLC, licensees are dropping as the competition develop newer, faster and cheaper ways to deliver access to their trading systems. Safe & Legit still require you to purchase expense fixed private comms to deliver their trading apps. They do not want these 3rd party users to be impacted at all during the migrations and for there to be a near zero RTO and RPO

• The UK site has been chosen as the first site to be migrated but due to Safe and Legit’s work on ground defence equipment they have not authorised the running of a capacity planner collection as they don’t want their data to leave the premises but have calculated that for each site to be virtualised the environment must be able to meet the following values:

-The 6k physical servers in the UK are comprised of 2000 Linux servers and 4000 Windows servers

-On average each windows server is provisioned with 20GB boot disk (average used is 15GB) and a 50GB data disk (average used is 30GB)

– Each Linux server is configured with 60GB total storage (average used is 30GB)

– Safe and Legit expect a 10 percent annual server growth over the next three years

-Safe and Legit have a long standing vendor relationship with EMC and Cisco and so have requested the usage of their equipment due to this relationship and in house knowledge of the administration of these vendor products

-They have created the following two tables from internal analysis and monitoring:

CPU Resource Requirement
Metric	Amount
Avg # of CPUs per physical server	4
Avg CPU MHz	3,400 MHz
Avg normalised CPU MHz	1,240
Avg CPU utilisation per physical system	5% (170 MHz)
Avg Peak utilisation per physical system	8% (272 MHz)
Total CPU resources req for 1k vm’s at peak	272,000 MHz

RAM Resource Requirement
Metric	Amount
Avg amount of RAM per physical system	4096MB
Avg memory utilisation	30% (1228.8MB)
Avg Peak Memory Utilisation	80% ( 3276.8MB)
Total RAM required for 1k VMs at peak before memory sharing	3,276,800MB
Anticipated memory sharing benefit when virtualised	50%
Total RAM req for 1k VMs at peak with memory sharing	1,638.400MB

Business Requirements

From workshops and SME meetings the following requirements were collected

Number	Requirement
R001	Virtualise the existing 6000 UK servers as virtual machines, with no degradation in performance when compared to current physical workloads
R002	To provide an infrastructure that can provide 99.7% availability or better
R003	The overall anticipated cost of ownership should be reduced after deployment
R004	Users to experience as close to zero performance impact when migrating from the physical infrastructure to the virtual infrastructure
R005	Design must maintain simplicity where possible to allow existing operations teams to manage the new environments
R006	Granular access control rights must be implemented throughout the infrastructure to ensure the highest levels of security
R007	Design should be resilient and provide the highest levels of availability where possible whilst keeping costs to a minimum
R008	The design must incorporate DR and BC practices to ensure no loss of data is achieved
R009	Management components must secured with the highest level of security
R010	Design must take into account VMware best practices for all components in the design as well as vendor best practices where applicable
R011	Any others you think I have missed from the scenario

Additional Functional Requirements (From Storage Design posting)

-5K 3rd party users will need to be able to gain access into the environment without any impact during the migration and consolidation

-Rented DC’s kit needs to be fully migrated to the privately owned datacenter before Q1 2015 to ensure the contracts don’t need to be renewed

Constraints and Risks

You tell me in the comments

Constraints from Storage Design posting:

– Usage of EMC kit

– Usage of Cisco kit

– Usage of the privately owned DC’s physical infrastructure for the consolidation of all three UK DC’s.

Risks from Storage Design posting:

– The ability of ensuring near-zero downtime during the migration of workloads to the privately owned DC may be at risk due to budget constraints impacting the procurement of the required infrastructure to ensure zero downtime

Additional Questions (from Storage Design posting)

This is something I feel is really important when doing real world designs is trying to think of as many questions around a customer requirements so that you can ensure you have their requirements recorded correctly and that they aren’t vague.The additional questions and the answers to them are listed below:

Q: Is there any capability of utilising the existing storage in the privately owned UK DC?

A: Due to the consolidation and migration of the other UK DC’s and the current workloads in the privately owned DC a new SAN is a better option due to the SAN being 3 years old now and so it is more cost effective to purchase a new one. Also due to the probable need for auto-tiered storage to meet the customers requirements a new SAN with these capabilities is needed

Q: Is there no way a minimal planned outage/downtime can be organised for the migration of the workloads due to the likely higher cost of equipment to ensure this near-zero downtime?

A: The customer would prefer to try keep to the near-zero downtime and so it is agreed that after the conceptual design of the storage and the remaining components in the whole design further meetings can be held to discuss a balance between cost and the desire for near-zero downtime

Q: With the leasing out of the private level 4 suites in the future will there be a requirement to manage/host other companies processes and data within this infrastructure being designed?

A: No there is currently no plan to do this due to security concerns and the number of compliancy regulations Safe and Legit need to maintain and fulfil. There is however a possibility of internal consumption and charging for usage of the DC’s resources to other departments.

Summary

So that is the company profile and my idea around it. I obviously created 90% of the above from my head so there will be additional questions around it but I think this gives a really solid amount of information for people to start thinking. I’m going to do the first posting around Storage Design for Safe and Legit quite soon and will put up what questions and component you normally have to think of but if people want to think of what they would choose prior then hopefully we can get a good discussion going around it.

As I add each section to the design I am hoping to keep updating this posting and then once complete making it all linked on a single page on my blog