Safe and Legit Storage Design Completed

February 18, 2013 by Gregg Robertson 13 Comments

Below is my thoughts, additional questions I felt needed to be asked/things to be clarified and the Design decisions,justifications and impacts due to these decisions for the Safe and Legit Storage design. If you missed the posting where I detailed the mock scenario you can read it here

Note: This is a learning exercise for me so if you feel I’ve missed something or made a wrong decision then please write it in the comments and I’m more than happy (it was one of the main reasons I’m looking to do this series of postings) to discuss and I’ll amend the design accordingly if it makes sense and hopefully I along with other people reading these postings will learn from it and become better.

Additional Questions

As I said there probably would be and which is something I feel is really important when doing real world designs is trying to think of as many questions around a customer requirements so that you can ensure you have their requirements recorded correctly and that they aren’t vague.The additional questions and the answers to them are listed below:

Q: Is there any capability of utilising the existing storage in the privately owned UK DC?

A: Due to the consolidation and migration of the other UK DC’s and the current workloads in the privately owned DC a new SAN is a better option due to the SAN being 3 years old now and so it is more cost effective to purchase a new one. Also due to the probable need for auto-tiered storage to meet the customers requirements a new SAN with these capabilities is needed

Q: Is there no way a minimal planned outage/downtime can be organised for the migration of the workloads due to the likely higher cost of equipment to ensure this near-zero downtime?

A: The customer would prefer to try keep to the near-zero downtime and so it is agreed that after the conceptual design of the storage and the remaining components in the whole design further meetings can be held to discuss a balance between cost and the desire for near-zero downtime

Q: With the leasing out of the private level 4 suites in the future will there be a requirement to manage/host other companies processes and data within this infrastructure being designed?

A: No there is currently no plan to do this due to security concerns and the number of compliancy regulations Safe and Legit need to maintain and fulfil. There is however a possibility of internal consumption and charging for usage of the DC’s resources to other departments.

Q: What other questions do you feel should be asked?

Additional Functional Requirements

-5K 3rd party users will need to be able to gain access into the environment without any impact during the migration and consolidation

-Rented DC’s kit needs to be fully migrated to the privately owned datacenter before Q1 2015 to ensure the contracts don’t need to be renewed

Constraints

Below are the constraints I felt were detailed in the scenario. These will possibly change as I go further through all the other sections but so far these are the ones I felt were applicable:

– Usage of EMC kit

– Usage of Cisco kit

– Usage of the privately owned DC’s physical infrastructure for the consolidation of all three UK DC’s.

Assumptions

Below are the assumptions I felt had to be made. These will possibly change as I go further through all the other sections and normally I try to keep these as minimal as possible but for a project of this size it would be extremely difficult to not have any as you do have to trust certain things are in place:

– There is sufficient bandwidth between the UK DC’s to allow migration of the existing workloads with as little of an impact to the workloads as possible

– All required upstream dependencies will be present during the implementation phase.

– There is sufficient bandwidth into and out of the privately owned DC to support the bandwidth requirements of all three DC’s workloads

– All VLANs and subnets required will be configured before implementation.

– Storage will be provisioned and presented to the VMware ESX™ hosts
accordingly.

– Power and cooling in the privately owned DC is able to manage the addition of the required physical infrastructure of the Virtual Infrastructure whilst for a certain amount of time having older physical machines still running alongside

– Safe and Legit have the existing internal skillset to support the physical and virtual infrastructure being deployed.

– There are adequate licences for required OS and applications required for the build

Risks

– The ability of ensuring near-zero downtime during the migration of workloads to the privately owned DC may be at risk due to budget constraints impacting the procurement of the required infrastructure to ensure zero downtime

Storage Array

Design Choice	EMC FC SAN with two x8GB SP

Justification	-EMC due to constraint of having to use EMC storage due to previous usage -EMC VNX 5700 with Auto-Tiering enabled – 8GB to ensure high transmission speeds to the storage,12GB is too high and expensive for this design

Design Impacts	-Switches will need to be capable of 8GB connectivity – FC Cabling needs to be capable of transmitting 8GB speeds -HBA’s on ESXi hosts need to be capable of 8GB speeds

Number of LUNs and LUN sizes

Design Choice	400 x 1TB LUNs will be used

Justification	-Each VM will be provisioned with 50GB average of disk -So with around 15 vm’s per lun + 20% for swap and snapshots, 15x 50GB / .8 = 937.5 – So 6000 total VM’s / 15 VMs per LUN = 400 LUNs

Design Impacts	-Tiered storage will be used with auto tiering enabled to balance storage costs with VM performance requirements

Storage load balancing and availability

Design Choice	-EMC PowerPath/VE multipathing plug-in (MPP) will be used.

Justification	-EMC PowerPath/VE leverages the vSphere Pluggable Storage Architecture (PSA), providing performance and load-balancing benefits over the VMware native multipathing plug-in (NMP).

Design Impacts	-Requires additional cost for PowerPath licenses.

VMware vSphere VMFS or RDM

Design Choice	-VMFS will be used as the standard unless there is a specific need for raw device mapping . This will be done on a case by case basis

Justification	-VMFS is a clustered file system specifically engineered for storing virtual machines.

Design Impacts	-Usage of the VMware vSphere Client to create the datastores must be done to ensure correct disk alignment

Host Zoning

Design Choice	-Single-initiator zoning will be used. Each host will have two paths to the storage ports across separate fabrics.

Justification	-This is keeping to EMC best practices and ensures no single point of failure with multiple paths to targets across multiple fabrics

Design Impacts	-Zones will need to be created for each portion by the storage team

LUN Presentation

Design Choice	-LUNs will be masked consistently across all hosts in a cluster.

Justification	-This allows for virtual machines to be run on any host in the cluster and ensures both HA and DRS optimisation

Design Impacts	-The storage team will need to control and deploy this due to the masking being done on the storage array

Thick or Thin disks

Design Choice	-This provisioning will be used as the standard unless there is a specific need for thick provisioned disks . This will be done on a case by case basis

Justification	-The rate of change for a system volume is low, while data volumes tend to have a variable rate of change.

Design Impacts	-Alarms will need to be configured to ensure that if disks reach an out of space condition there is ample time to provision more storage

Virtual Machine I/O Priority

Design Choice	-Storage I/O Control will not be used

Justification	-This is due to the storage utilising Auto-Tiering/FAST which works at the block level to balance and is therefore a better way of balancing – Due to the likelihood that VMware SRM is going to be used then SDRS and SIOC is not supported

Design Impacts	– FAST/Auto-Tiering will need to be configured correctly by the storage vendor

Storage Profiles

Design Choice	-Storage Profiles will not be configured

Justification	-Storage will be managed by the storage team

Design Impacts	-Storage team will need to configure storage as the virtual infrastructure requires

Describe and diagram the logical design

Attribute	Specification
Storage Type	Fibre Channel
Number of Storage Processors	2 to ensure redundancy
Number of Fibre Channel Switches (if any)	2 to ensure redundancy
Number of ports per host per switch	1
Total number of LUNs	400 (as mentioned above)
LUN Sizes	1TB (as mentioned above)
VMFS datastores per LUN	1

Describe and diagram the physical design

Array vendor and model	EMC VNX 5700
Type of array	Active-Active
VMware ESXi host multipathing policy	PowerPath/VE MPP
Min/Max speed rating of storage switch ports	2GB/8GB

I’m looking for the correct EMC diagrams to create the physical design diagram so will update this postings this week with the diagram promise Smile

Well that’s my attempt at the storage design portion of Safe and Legit. Hopefully people will agree with most of the decisions I’ve made if not all of them and I have to admit it took me most of my Sunday just to do this piece and think of all the impacts and as stated there may be additional constraints and risks further down the line.

Gregg