A successful cloud deployment requires proper planning and determining the right cloud configurations and then executing the plan as it is. But to create a successful cloud deployment plan, Organizations should need to understand why they want to choose cloud Computing over traditional on-premises data centers, and then clearly define their requirements and success criteria as per business needs. These clear requirements will help organizations to determine whether they are going in the right path towards cloud computing.
See: Storage Infrastructure concepts & Administration – Quick Reference Guide
In this post, we will learn how to determine the right cloud deployments & configurations and to create a right cloud strategic plan to ensure the organizations cloud journey is success and productive. Following topics are covered in this post.
Determine the right Cloud Computing Platform
To successfully migrate an existing application to cloud platform or to deploy a new application into the cloud platform, it is really important that the IT managers and organizations clearly understand what cloud computing offers and the services offered by various cloud service providers in the current market. AWS, Azure, GCP and Oracle clouds are the popular public and private cloud providers and they all offer similar services when it comes to Compute, storage and network services.
Organizations or IT managers should first understand the basic fundamentals of Cloud Computing and its benefits before they can create strategic plan to adopt and leverage cloud benefits. Understanding these cloud services and models is the key to determine the right cloud platform for every IT manager.
- Cloud Computing is basically outsourcing data center operations and infrastructure management activities to a 3rd party provider generally referred as Cloud Service Provider. Amazon’s AWS, Microsoft’s Azure, Google’s GCP are the top cloud service providers respectively as of today.
- These cloud service providers allows customers to use the virtual infrastructure and cloud platforms on a utility based model where customers have to only pay for the resources they used or consumed, this is also referred as pay-as-you-go model.
- Cloud Computing offers infrastructure elasticity which means the ability to scale up and scale down the number of virtual infrastructure resources quickly and easily as per the demand. This feature is the one of primary reasons why customers wants to adopt cloud.
- Since the cloud customers only pay for the services they use, there are no capital & operational expenses for the underlying hardware. There are many cost saving opportunities are possible if the cloud strategy is properly planned and executed.
- Cloud Computing offers various types if Cloud Service models, they are
-
- Infrastructure as a Service (IaaS)
-
- Software as a Service (SaaS)
-
- Platform as a Service (PaaS)
-
Cloud Computing offers customers to use their cloud services across multiple geographical regions to provide fault tolerance and increased performance for their applications that are built in cloud.
-
Each region will then have two or more availability zones for high availability and fault tolerance. These availability zones are nothing but a physical data centers built with their own redundant power and network infrastructure and each availability zone is separated by at least 100 KMs.
- Cloud Computing service providers typically use pooling of virtual Compute, storage and memory resources which allows them to provide the capability of elasticity and scalability on demand. For high-level overview on how virtualization techniques are used in cloud computing see Basics and Fundamentals of Virtualization. These resource pools are not visible to the customers and are only managed by the cloud service providers, the commonly used virtual resource pools in the backend of cloud datacenters are
-
- Compute pools – CPUs from multiple physical machines are pooled to create virtual compute pools.
-
- Memory pools – Memory from multiple physical memories are pooled to create virtual memory pools.
-
- Network pools – Multiple networks are created and pooled by using multiple virtual LAN interfaces, virtual switches and virtual NICs.
-
- Storage pools – Storage pools are created using technique called Storage Area Network (SAN) where large number of physical SSD and magnetic disks are pooled together to provide virtual disk capacity on the cloud servers.
- Cloud providers offers multiple interfaces to use and mange the cloud resources and services. Most cloud providers offer following cloud interface methods
-
- Graphical User Interface (GUI) – This is a web-based interface which most of the IT & non-IT folks use and it is easy to navigate and understand the cloud platforms and offerings.
-
- Command Line Interface (CLI) – This is a text based interface used to configure, manage and troubleshoot cloud deployments over command line and it is more efficient and easy than using GUI.
-
- Application Programming Interface (API) – This allows programmatic access, control and configuration of various cloud resources and services in a secure manner.
-
- Dashboards – Cloud service providers and many 3rd party vendors offers custom dashboards and portals to manage and monitor activities in cloud environments.
Understanding what exactly cloud computing offers and the different types of services that are made available by various cloud service providers will help IT managers to determine the right cloud platform based on their requirements.
Most organizations choose Amazon Web Service as its preferred cloud platform due to its variety of services, however the number of companies preferring Microsoft Azure and Google Cloud is also increasing in last 2 years. Some companies also choose multi cloud strategies to leverage the best of all the cloud service providers and to prevent vendor lock-ins.
Determine the right Cloud Deployment Model
Various cloud service providers offer variety of cloud services, once the right cloud computing platform is determined, it is now important to understand the various cloud deployment models and their use cases to determine the best and desired cloud deployment model as per the organization goals and road-maps.
Four types of Cloud deployment models are commonly used in the Cloud world. Based on the business, regulatory and cost requirements, IT teams should determine the right cloud deployment model. Below are the commonly used cloud deployment models.
Private Cloud
- As the name suggests, this is a private cloud model where the cloud virtual resources are only shared with one or more closely related companies.
- Cloud resources are accessed over the internet, private networks and dedicated connections.
- This model is best suited for deployments which have secured confidential information and core systems.
- It also allows organizations to have complete control over all networking and other resources unlike Public cloud model.
Public Cloud
- This is a public cloud where the virtualization resources are shared with everyone who consume the cloud providers services from that provider.
- Public cloud resources are typically accessed over internet and dedicated network connections.
- This model is best suited for systems and applications which has less security and confidential requirements.
- Since the backed physical infrastructure resources are shared across multiple organizations, external network control is generally owned by cloud providers. For example, public IP address allocation is done by cloud service providers and the organizations has to configure the DNS routing to map their applications to the public IPs.
Hybrid Cloud
- This is the combination of both Public and Private Cloud service offerings that can be used to deploy applications.
- Alternatively the term hybrid cloud is also used if the cloud is an extension of an existing corporate datacenter. The on-prem network is just extended to Cloud to leverage the cloud services.
Community Cloud
- In this model, Cloud infrastructure is shared between several organizations from the similar community but still have an isolated environment for each organization.
- This is primarily designed to help organizations who share a common goal and to reduce cost, such as non-profit organization etc.
- Cloud resources in this model are accessed over internet or private connections.
Most startup companies choose public cloud model to prevent the upfront IT Capital expenses, where as the existing enterprise companies who already have their own data centers generally choose public and hybrid cloud models to leverage both the existing infrastructure and cloud infrastructure.
There are some organizations who wants to completely migrate all the systems to public cloud to decrease their data-center maintenance and IT equipment costs. Hence, determining the right cloud model is fully dependent on the organization’s goal and future business roadmaps.
Determine the right Cloud Network Configurations
Cloud service providers generally allows their customers to create virtual networks (like VLANs in on-prem data center) in order to deploy compute, database and other cloud resources. Multiple virtual networks can be created to segregate Development, Test and Production workloads. To learn more about Networks see Fundamentals and basics of Network Infrastructure.
Organizations or the IT teams who is responsible for cloud deployments should understand the following cloud network options and determine the desired network configurations based on the requirements before any cloud services are utilized.
Virtual Cloud Networks
- Determine the number of private and public IP addresses required and create a plan for creating virtual networks and subnets. By default, following internal IP address ranges can be used to create internal networks with in cloud environments as per RFC 1918.
-
- 10.0.0.0/0 to 10.255.255.255
-
- 172.16.0.0/0 to 172.31.255.255
-
- 192.168.0.0/0 to 192.168.255.255
- Determine the firewalls that needs to be configured and the inbound/outbound ports to be enabled/disabled on the virtual cloud resources to function properly & securely. Commonly used network protocols and ports for the cloud deployments are
-
- HTTP – port 80
-
- HTTPS – port 443
-
- SSH – port 22
-
- DNS – port 53
-
- DHCP – port 68
-
- FTP – port 21
-
- SMTP – port 25
- Determine the type of Virtual Private Network (VPN) to configure to create secure IPSEC connection between on-prem datacenter or an corporate office to the cloud virtual networks.
- Create Demilitarized Zone (DMZ) to create an isolated logical sub-networks to securely expose the public facing applications and services via untrusted network such as internet.
- Plan and implement Intrusion Detection System (IDS) and Intrusion Prevention System (IPS) to scan, monitor and prevent suspicious activity and vulnerability exploits across the traffic in the virtual cloud networks.
Network Services
There are multiple network services that needs to be implemented in the cloud network to support applications in the cloud environments. Following are some of the important network services every cloud deployment should have
- MultiFactor Authentication – To allow user authentication multiple times using various 3rd party tools
- Firewalls – To allow or deny traffic on the port and IP address
- Intrusion Detection and Intrusion Prevention systems (IDS/IPS) – To prevent and monitor the abnormal activities across network traffic.
- Virtual Private Network (VPN) – To allow secure IPSEC connection from corporate office to Cloud datacenters or 3rd party services.
- Domain Name System (DNS) – For mapping/resolving custom domain names to IP addresses and vice versa.
- Dynamic Host Configuration Protocol (DHCP) – For automatic assignment of IP addresses to cloud resources
- LDAP – Light Weight Access Protocol to provide Active Directory Services
- Certificate Services – For generating certificates and encryption codes
- Agents – For providing connectivity to an outside service
- Antivirus/Anti-Malware – For protecting cloud VMs from viruses and malware’s
- Load Balancer – To distribute the traffic to multiple servers
Determine the right Cloud Compute Configurations
Cloud Computing offers wide variety of compute or virtual machine resources that are designed for different requirements and use cases. It is very important to understand the various virtual machine types that are offered by cloud service providers to determine the right sizing of the virtual machine to meet the application requirements. Having clear understanding of Virtual machine components is key to determine the right compute configurations, see Compute Server or Virtual Machine Overview
- Cloud compute servers are generally referred as cloud instances and these instances are typically classified based on the CPU, memory and storage configured on the instances. Following are the commonly available instance types across all the cloud platforms.
-
- General purpose Compute Instances which provides a balance of compute, memory and networking resources, and can be used for a variety of diverse workloads.
-
- Compute Optimized instances have high CPU and they are ideal for compute bound applications that benefit from high performance processors.
-
- Memory Optimized instances have high memory which are designed to deliver fast performance for workloads that require high memory.
-
- Storage Optimized instances have high local storage capacity which are designed for workloads that require high, sequential read and write access to very large data sets on local storage.
-
- Graphical processing instances have high GPUs which are designed for workloads that require high graphical processing, machine learning and deep learning applications.
- Cloud computing allows users to deploy the cloud resources in multiple regions and availability zones to meet geographical and data regulatory requirements. It is important to understand how to deploy applications in active/active or active/standby configurations based on the RTO and RPO to provide fault tolerance, high availability and disaster recovery capabilities by using regions and availability regions.
- Establish performance benchmarks, metrics and baselines using monitoring tools. Commonly monitored virtual machine performance parameters are
-
- CPU Utilization
-
- Memory Utilization
-
- Storage performance and utilization
-
- Network and LAN statistics
- Each type of cloud instances have different pricing, based on the server specifications. Cost of running cloud instances should be factored before launching an instance.
- Instances can be configured to automatically start and stop based on their usability to reduce cost, this is one of the cloud computing’s popular and common advantage.
- Be aware that you can use dedicated instances instead of shared instances if there are special requirements such as security and regulatory requirements. Note that dedicated instances will also cost higher than shared instances because they are dedicated to specific customer only.
- Other Important techniques to be considered and understood before determining the right instance type and size are mentioned below and these are generally owned and managed by cloud service providers and cloud customers will not have any visibility or access to modify.
-
- Memory Bursting – Maximum amount of memory that a virtual compute instance can use.
-
- Memory Ballooning – It is a technique that allows a cloud VM to reclaim unused memory from other VMs that are running on the same host.
-
- Memory Over commitment – Which allows the underlying hyper-visor to allocate more memory to the VMs running than is physically available, upto some extent based on the over-commitment ratio.
-
- CPU Hyper threading – It is a technique which allows a single CPU core to act as two separate virtual cores to the operating system to enhance performance.
-
- CPU over commitment – which allows underlying hypervisor to allocate more processors to VMs than are physically available.
Determine the right Cloud Storage Configurations
Cloud service providers generally offers unlimited storage capacity in cloud storage, which is the another primary reason for increasing cloud adoption in many organizations. Cloud storage can be attached to the compute instances and as well as use it as standalone storage destination to store large amounts of data. To determine right cloud storage configurations, it is important to know the basics of storage devices and storage management techniques. To know more see Storage Infrastructure Basic concepts.
Cloud based storage systems are typically external from the compute instances and cloud service providers offers multiple ways to mount & access the data stored in the cloud storage , hence selecting right storage type for the application is key to obtain better performance.
Cloud Storage Types
- Cloud service providers commonly offer three different types of storage.
-
- Direct Attached Storage – This type of storage is directly attached to the VM to provision as local storage using ATA, SATA and SCSI protocols.
-
- Block Based Storage – This type of storage is provisioned by using a pool of block storage systems using Storage Area Network (SAN) techniques which provides high speed, highly redundant storage system for interconnecting storage devices. These are typically used to attach to the compute instances.
-
- File Based Storage – This type of storage is provisioned by using a pool of file based storage systems using Network Attached Storage (NAS) techniques which provides file level access to data across network. These are commonly used for creating CIFS and NFS shares that can be mounted on compute instances.
-
- Object Based Storage – This type of storage is only used in cloud computing platforms where data is managed as objects instead of blocks and files. This type of storage is offered as unlimited storage and with various storage classes where each storage class has different pricing.
Storage Provisioning Techniques
- Cloud service providers generally uses one or more of the following storage management techniques in the backend when a storage is requested by consumers.
-
- Thick Provisioning – All the required storage is allocated during provisioning at once.
-
- Thin Provisioning – Storage is allocated on an as-needed basis and on-demand.
- Cloud storage offers tiering of storage which provides a trade-off between performance and cost. This tiering is available in block, file and object level storage types
-
- Tier 1 Storage – Suitable for critical and frequently accessed data and this is the expensive tier.
-
- Tier 2 Storage – Suitable for the data that does not need fast read and write requirements.
-
- Tier 3 Storage – Suitable for data that is accessed very infrequently such as archive data and this is the slowest and inexpensive tier.
Storage Security Techniques
- Security for the data stored in cloud storage is the important factor to consider when deploying or moving an application to cloud. Since most of the storage is made publicly accessible via browser, appropriate security controls needs to be implemented to protect the data in rest and in transit.
- Cloud service providers configures security at at the hardware level such as zoning, obfuscation and LUN masking etc. but it is the responsibility of the consumers to manage the security at the presentation & application layers. Commonly used storage security configurations are
-
- Access Control Lists & Policies – These are the list of permissions which determines who or which internal and external services can access the data.
-
- Data Encryption – Data stored in the storage should be encrypted using encryption keys provided by cloud provider or by using custom self managed keys.
-
- Tokens – Requires a security tokens to access the data for a limited amount of time.
Data Protection Techniques
Cloud service providers offers many different types of data protection techniques to protect the data that is stored in cloud storage. While the storage is automatically replicated into multiple disks across availability zones and regions by cloud service providers in the backend, it is also the responsibility of cloud consumers to protect their data based on their data protection requirements. Following data protection techniques can be leveraged as needed.
-
- Automatic snapshots – Storage volumes and disks can be configured to take automatic backups in regular intervals and the data can be restored easily in case restore is required.
-
- Cross regional backups – Critical data can be configured to take automatic backups and stored in another region for disaster recovery purposes.
-
- RAID configurations – Block based storage i.e virtual disk drives can be used to create RAID disks if consumer needs to create redundant array of disk to gain better performance and redundancy. See Different types of RAIDs and their advantages and disadvantages.
Determine the right Cloud Security Policies
Security in the cloud is the important factor which is still stopping some of the organizations to adopt cloud services at large scale. Security of cloud resources is the shared responsibility between the cloud service provider and cloud consumer, so it i key to understand what security features are enabled by default and what security controls to be enabled. To understand more about infrastructure security read Infrastructure Security fundamental basics.
Security policy is a set of instructions that defines organizations cloud controls, policies, responsibilities and security techniques to secure cloud infrastructure and applications. Security policies generally consists of management of user and user groups, user authentications, user authorizations and cloud service policies. Following security polices and techniques must be understood to determine the right level of security to be implemented across the cloud deployments.
Security Responsibility
Security responsibilities will vary based on the type of cloud deployment model chosen by the cloud consumer or an organization. Cloud deployment model determines the responsibility of the security across all the layers of cloud infrastructure
- In private cloud model, security and operations is the complete responsibility of the organization owning the cloud.
- In public cloud model, security is the shared responsibility between the cloud service provider and the cloud consumer (organization).
- In hybrid cloud model, similar to the public cloud and it is the shared responsibility.
- In community cloud model, security is the responsibility of the community cloud provider.
Similar to the cloud deployment model, security responsibilities across various layers of infrastructure will vary based on the type cloud service model used.
- In IaaS model, security is the shared responsibility between cloud service provider and organizations. Cloud service provider is responsible for the security across physical and hypervisor layers whereas organizations are responsible for security across virtual machines and applications layers.
- In PaaS model, cloud service provider is responsible for the security across physical, hypervisor and virtual machine layers and organizations are responsible for security at applications layer.
- In SaaS model, complete security across all layers is the responsibility of cloud service providers. Consumers are only responsible of security of users and functionality of the services they consume.
Security Standards and Compliance
Depending on the organizations business model, sometimes organizations needs to make sure that the cloud deployments follow certain security compliance standards and regulations. For example, health care industries need to ensure the cloud deployments are HIPPA compliant.
Almost every cloud service provider ensures that their cloud services adheres to most of the security compliance standards but at the end it is the organizations responsibility to ensure that the cloud service they are using is compliant and it is allowed to use. Many 3rd party vendors are also available to assist with these compliance requirements. Some of the common security compliance standards are
- Service Organizations Controls 1 (SOC1) – This report provides the controls commonly required for financial reporting.
- Service Organizations Controls 2 (SOC2) – This report provides the controls required for non-financial reporting such as security, integrity and process.
- Service Organizations Controls (SOC3) – This report is for public disclosure of SOC2 reports.
- FISMA – This is the Federal Information Security Management Act which is a US federal law that provides the framework to protect private federal government information operations and facilities.
- FedRAMP – This security compliance provides the standards for security and monitoring of cloud services.
- DIACAP – Department of Defense Information Assurance Certification and Accreditation Process which provides instructions for IT compliance.
- PCI-DSS – Payment Card Industry & Data Security Standard generally applicable for eCommerce business and online transactions which sets requirements for processing credit and debit transactions.
- ISO-27001 – International Organization for Standardization (ISO) to ensure cloud providers meet all the regulatory and statutory requirements.
- ITAR – International Traffic in Arms Regulation which regulates and restricts information from being exported.
- FIPS 140-2 – This standard is managed by National Institute of Standards and Technology (NIST) that manages the requirements and standards for cryptography.
- MPAA – Motion Picture Society of America which provides set of rules for storing, processing and delivering media.
- HIPPA – Health Insurance Portability and Accountability Act that defines standards for protecting medical patient data.
- GxP – General X (Any) Procedures defines the set of rules and procedures for managing applications. X can be clinical, laboratory and manufacturing.
User Authentication Policies
User access controls must be defined to ensure who or what can view or utilize the cloud resources. Most of the cloud service providers allows consumers to implement the following security techniques for user access controls management
- Role-based Administration (RAC) – It is a access control method to restrict users based on the user roles in the organization.
- Mandatory Access Controls (MAC) – In this method, access is controlled by comparing the users rights against the security properties of the object being addressed.
- Discretionary Access Controls (DAC) – This method is similar to Mandatory Access Controls but the difference is that it is not centrally managed.
- Non-Discretionary Access Controls – This method is similar to RBAC method.
- Multi-factor Authentication (MFA) – This method allows users to use 3rd party tools or cloud native apps to authenticate the users by generating a PIN or code which expires after certain time.
- Federation – This technique allows multiple organizations to use the same credentials for authentication. For example, using facebook credentials to access other websites and applications.
- Single Sign-on – This method allows users to use their existing credentials to access multiple applications or systems, user just need to sign in once for accessing multiple systems.
User Authorization Policies
After the user is authenticated by using one or more methods discussed above, user is then need to go through the authorization process to determine the level of access to the cloud systems and what cloud objects can be accessed. Following are some of the commonly implemented user authorization techniques in cloud.
- User Accounts – Each user will have his own user ids to access the cloud platform and resources, these are generally mapped with organizations Active Directory user ids.
- User Groups – User groups must be created to easily manage the group of users who has similar access requirements. These groups can also be used to create specific permissions who only manage one or more cloud resources such as compute, network and storage.
- Service Groups – This type of group can be used to restrict the users from using certain cloud services which are critical such as DNS & firewall services etc.
Determine the right Cloud Security Configurations
Strict security policies must be defined and then configured across all layers of cloud deployments to ensure the applications hosted in cloud platforms are secure. Cloud service providers allows organizations to define their own security policies and configurations based on their requirements.
Network Connectivity Security Configurations
Cloud consumers or organizations generally access the cloud based services over internet remotely, therefore the network connections between organizations to the cloud providers data centers must be secured to prevent the network and data from hackers. Following are the available network tunneling protocols that can be implemented
- Point to Point Protocol (PPTP) – This provides point to point connectivity but this technique is not in use and has be largely replaced.
- Layer 2 Tunneling Protocol (L2TP) – This protocol is developed by CISCO and Microsoft to enable connection to a remote device over public internet.
- Generic Routing Encapsulation (GRE) – This protocol encapsulate any network layer inside a virtual link between two locations.
Compute Security Configurations
VMs in cloud contains the critical business applications and data and it is important to secure the VMs in multiple layers to protect them. Below are some the commonly use security strategies to control the cloud servers or instances.
- Disable default user accounts and change the passwords if they cant be deleted.
- Disable unused services if they are not in use such as FTP, WWW and DNS etc.
- Enable firewalls and ports to allow only necessary network traffic
- Install and configure Antivirus and other security agents and keep them updated regularly
- Install OS level patches in regular intervals.
Data Encryption Techniques
Data that is stored and processed in cloud based storage must be encrypted to ensure the integrity and availability of the data at all times. Cloud providers offers several encryption techniques to protect the data at rest and in transit within the cloud environment. Following are the encryption techniques that can be implemented across cloud deployments
- Advanced Encryption Standard (AES) – Offers encryption with 128, 192 and 256 bits
- Triple Data Encryption Standard (3DES) – Its a replacement to AES
- RSA – Uses public and private key pairs which uses PKI framework for encryption
- Digital Signature Algorithm (DSA) – This is similar to RAS but with slower I/O operation
- RC4 – It is not in use anymore due to its susceptibility to compromise and it uses shared keys to encrypt and decrypt the streams of data.
- RC5 – This is the replacement for RC4 and uses a variable length key.
Data Classification Configurations
Data stored and processed in cloud platforms must be classified into one or more tiers in order to meet any regulatory requirements, mitigate risk and secure data based on the sensitivity and confidentiality of the data in the tiers. Commonly used data classifications in cloud platform are
- Publicly accessible Data – Data which is allowed to access via public internet and it requires least amount of security controls.
- Private Data – Data which not allowed to access by via public internet and it is only allowed to access in internal networks. This requires additional security policies to limit the access.
- Restricted Data – Data which is highly restricted and may need to be encrypted in rest and transit to meet certain security compliance and regulatory requirements such as PCI-DSS and HIPPA.
Application or System Security Configurations
Organizations needs to segment the cloud deployments into sections so that different security policies can be applied at each layer to get greater control of the application from end to end. Cloud deployments can be segmented into
- Web Layer – This is mostly the web front end layer where web services are access via internet, therefor less security polices are applied.
- Application/Service Layer – This is where the actual application or the application services resides and security polices needs to be applied in such a way the only web and database layers can access the application layer.
- Database Layer – Access controls and data security policies needs to be applied at this layer as it stores the data and data elements that are used by application layer.
- Storage Layer – Unique security policies needs to applied on this layer as data will be stored for both short term & long term basis.
Automating Security Operations Tasks
Cloud Service providers allows organizations to automate certain security operations tasks to allow rapid and consistent response to security events. Following security operation tasks can be automated or orchestrated using custom built tools or 3rd party tools that are available in the market.
- Automatically collect and analyze the logs generated from various applications and systems to monitor security threats.
- Automate motoring of user activity for unusual and abnormal behaviors.
- Configure automatic alerts based on the monitoring results.
- Create scripts to deliver wide range of security ops tasks to reduce human dependency.
Determine the right Cloud Deployment & Migration Path
Once the IT team understands and determines their compute, storage, network and security requirements the next important step is to identify the best migration and deployment path for migrating servers (both virtual & physical) to the desired cloud platform. Following migration methods will help to determine the right migration approach.
Infrastructure Migration Types
Cloud service providers offers different types of migration services to migrate the servers and their data automatically but at the end it is the customers responsibility to determine the best path. There are also many 3rd party vendors which offers migration services and they required to install/configure an agent in the existing on-prem infrastructure to scan and discover the IT resources in order to plan for the migration.
However, all the cloud service providers and 3rd party vendors follow this basic fundamental strategy to migrate any workload into cloud environment. The standard migration types are
- Physical to Virtual (P2V) – This is a common migration type and it will convert a physical server into a virtual machine and the downtime is required to convert physical servers into a Virtual server.
- Physical to Physical (P2P) – This will convert a physical server into physical server and this requires re-installation of application software on the new server from scratch and restore the data. This is typically seen in private cloud platforms.
- Virtual to Virtual (V2V) – This will convert a virtual server to another virtual server and this is the easiest type of migration compared to others.
- Virtual to Physical (V2P) – This will convert a virtual server into a physical server and this require complete reinstall of the application software and restore of the data. This is not a very common use case and it is rarely used due to application compatibility & performance issues.
Compute Images & VM templates
Each cloud service provider have their own and unique virtual machine formats and it is necessary to understand these different formats during migration. Some commonly used virtual machine formats in cloud deployments are
- Amazon Machine Image (AMI) – This format is used in AWS cloud for creating a EC2 instance (VM).
- Microsoft Virtual Hard Drive (VHD) – This format is used in Microsoft Azure cloud for creating a Azure Virtual Machine.
- Vmware VMDK – This format is used in Vmware cloud for creating Virtual machines
- Google Cloud Image – This format is used in Google Cloud Platform to create compute Engines.
- Oracle VDI – This format is used in Oracle Cloud for creating Oracle Virtual Box
Potential Migration Constraints
Migrating an application involves multiple cross functional teams and it is important that everyone have the same understanding and agreement on the migration path. Other important factors to be considered for determining a successful migration plan are
- Network Bandwidth – Network bandwidth will be an important key factor during migration window if there is large data is involved. Make sure the available bandwidth is sufficient.
- Downtime Impact – Downtime needs to be estimated by project team and communicated to the business to minimize business impact.
- Peak Time frames – Migrations are generally best performed during non-peak hours, peak times needs to identified during migration planning.
- Legal Restrictions – Some application have legal and regulatory restriction for migrating into cloud platform, it is the project team responsibility to verify these before migrating.
- Time Zones – Cloud providers offers their services in multiple regions with different time zones, if the application is dependent on time zone, then make sure to verify the region where the server is planned to migrated.
- Employee Working hours – Based on the critical and size of the application, it might require employees to do overtime and shift rotations.
Data Migration
Cloud service providers offers various types of data migration services to migrate small to very large sets of data. The amount of data to be migrated and the available network bandwidth needs to be considered before drafting a data migration plan. Data can be migrated using one or both of these below strategies
- Online Migration – This strategy uses existing network bandwidth to migrate data from on-premises to cloud storage. This is good if the data is small like less than 1-5 TB.
- Offline Migration – In this strategy, data can be shipped to the cloud storage via physical media such as hard drive or data appliances. This type of data migration strategy is best of data which is higher than 5 TB and the time taken to complete the migration will be slower due to shipping time.
Best Practices for Cloud Deployments and Testing
If determining the right cloud services, right compute, network, storage and security configurations is completed, then 40% of the work is done and the remaining 60% is creating and executing the deployment plan. Most of cloud deployments fail in the deployment phase because of the poor planning which lead the organization to cost above the budget and frustration. If planning is correctly done as per the requirements, executing the plan would make life easier.
Executing the cloud deployment plan includes creating system baselines, creating deployment plan and testing the implementation. Following best practices can be considered to ensure the cloud deployments are successful with minimum or no issues.
Create Baselines
Baselines are used to determine whether the service is in expected operational mode or service is in out of range. Alerts can be configured based on the metrics and threshold to ensure IT staff can react to service variations.
Every critical details of the existing system must be documented before migrating a system to the cloud. Some of the common details include
- Deployment configurations
- Support software
- Current operating environment
- External application dependency details.
These documentations must be updated regularly and kept up-to date so that the new migrated system performance can be evaluated based on the existing system baselines and performances to ensure that necessary service levels are maintained.
Create clear Deployment plan
This is generally done by IT project manager who is responsible for the whole system migration and he should be aware of the end to end workflow of the existing system and desired system after it is migrated to cloud. Project managers may use variety of project management tools to track and document the progress of the project.
Change Management – Many IT companies already follows ITIL’s Change Management process for managing all aspects of ongoing projects, troubleshooting and upgrades. Ensure change management procedures are followed in every stages of deployment to ensure that risks are adequately covered. Common change management process include
- Creating a deployment plan via change request in ITSM tool
- Change Advisory Board (CAB) reviews and approves the change
- Project team then implements the change and verify if the change is implemented as expected.
Standard Operating Procedure (SOP) – Create SOPs which are step-by-step instructions to help IT operations team to carry out complex tasks. SOPs ensure that tasks are performed consistently and any change activities are implemented as per the pre-defined and verified steps.
Execute Workflow – Workflow is a series of steps that are required to complete a task. For example workflow can be created for launching an EC2 instance in AWS cloud platform which involves multiple steps.
Leverage Automation and Orchestration -Cloud service providers offers custom automation tools and also there are 3rd party automation and orchestration tools which helps cloud teams to create monitoring dashboards, automate infrastructure deployments, update codes automatically etc. This will greatly reduce the manual efforts and allows teams to concentrate on other critical tasks.
Using Command Tools – Automation and Orchestration tools comes with the cost, therefore IT team can choose to use command line tools which are freely provided by cloud service providers to automate certain manual tasks by developing custom scripts for regular maintenance and as well as for new deployments.
Document Results – After the successful/failed deployment, IT teams must always document the results so that the steps can be compared with the deployment plan and to ensure that the application is functioning as expected. If the deployment is failed, this documentation can be review to correct the mistakes for the next deployments.
Create and Analyze the test plan
Creating testing plan before performing the actual testing is key to verify the success of deployments. In order to create the testing plan, one must understand the functionality of the application thoroughly across each layers i.e from infrastructure layer to application layer. Following parameters are commonly used for testing any system in cloud deployments.
Shared Components – Since most of the public cloud infrastructure are shared among multiple customers and accessed via internet, it is important to ensure adequate performance tests are included in the test plan. For example, some of the common shared componenet metrics are
- Testing Storage response time
- Latency and Network connections
- Network Bandwidth etc
Testing Code Promotion Process – Before the code is deployed into Production environment, it is important to test the code in lower environment such as DEV and TEST environments to ensure the code is correct and stable. Cloud computing allows IT teams to automate this process by using scripts and DEVOPS tools. Devops team, can utilize scripts and Devops tools to test the code before it is deployed into production systems.
Sizing – Test if the current size of the infrastructure is good for peak workloads. Cloud computing offers auto scaling features which allows systems to increase and decrease the compute resource as per the demand. Auto scaling features should be leveraged instead of over provisioning the resources to support peak demand loads.
High Availability – Cloud providers offers customers to build highly available applications by using regions and availability zones. Ensure that highly available scenarios are included in testing plan. Testing high availability scenarios include
- Compute server fail-over
- Network fail-over
- Storage fail-over
- Any external components fail-over
Data Integrity – Data Integrity means data must be consistent, reliable and accurate before and after migrating the data. It is important to check the data integrity i.e if data has been transformed properly, cloud service providers does this automatically while storing the data in cloud storage. Testing data integrity include
- If security permissions are preserved correctly after the migration
- If all the data is migrated completely.
Functionality Testing – The test plan must include testing of functionality of the application, this is generally referred as smoke test and is the most important testing. This test may involve
application desired functionally checks like
- Workflow functions and business scenarios
- If only authorized personnel are allowed to access the applications
- Audit trails check etc.
Data Replication – If shared file system is used or if data is configured to replicated to another region or zone, test plan must also include to test these capabilities. Testing data replication may include
- Any network failures during replication
- Speed of the replication
- File system failures
- Non synchronization of data
Load Balancing – Loadbalancer is generally used to distribute the traffic to group of servers that are attached to the load balancer. The test plan must include that all parameters of load balancer are tested to ensure the load balancer is configured correctly. For example, testing loadbalancer can include testing
- Peak traffic conditions
- Individual server failures
- Cache and Idle timeouts etc,
Automation – Automation and Orchestration tools automate the provisioning of cloud resources and as well as deploying the applications. It is also important to test these automated deployments such as deployment scripts, devops tools and integrated system connections to ensure consistent and error free deployments.
Analyze Test Results
Test results are the only proof that the system is deployed as per the plan and it is working as expected. Test results should be analyzed to determine if the testing was successful in relation to the application requirements. Analyse the following factors while analyzing the test results.
Consider Success Factors
Analyze the test results to identify the success criteria for each component and service, for example
- Sizing – Avoid over provisioning of virtual machines by testing memory, CPU to determine best configurations.
- Availability – Ensure that availability component failures are tested i.e CPU and network interfaces
- Data Integrity – Ensure the data integrity is achieved 100% and there has been no corruption of data during migration
- Functional – Ensure the desired functionality is achieved by testing various scenarios
Document Test Results
Test results must be documented in order to provide enough information to determine if the system is behaving as expected. Examples of test results are
- Details of test conditions such as system configurations, test steps and performance metrics
- Metrics that align with pre-defined baseline parameters
- Any issues encountered during testing
Baseline Comparisons
After the testing is completed, perform a baseline comparison to check the performance for the following metrics
- CPU utilization
- Memory utilization
- Storage consumption
- Database I/O performance
- Network bandwidth consumed
SLA Comparisons
Service Level Agreements defines the high-level performance parameters that a system must meet to maintain the required service levels such as service uptime percentage, system response times, mean time to recovery and turn around time. Analyze test results
- To ensure that the system is meeting key metrics from the SLA
- Verify if test scenarios support conditions that verify SLA compliance
Performance Variables
Ensure that all performance variables are validated for all defined usage scenarios. Example scenarios include
- Peak load conditions
- Heavy network
- Network failures
- Server failures
- Disaster Recovery scenarios such as region or availability fail-overs
[td_smart_list_end]