Cloud Infrastructure Performance Tuning -Basics and Fundamentals

Applications in cloud environment may need to be tuned for the performance over the time and to meet the continuous changes in the user demand. Performance of the cloud applications needs to checked or monitored in regular intervals so that the IT teams can analyse the monitoring data to take necessary steps to ensure the application performance is maintained consistently during the application life-cycle. Achieving the desired application performance by making automatic changes to the underlying infrastructure is one of the key advantages of adopting cloud computing.

In this post we will explore the basic performance issues found in both traditional and cloud datacenters, their importance, and how to manage system performance. Following are the common cloud infrastructure parameters which determines the performance of the IT infrastructure and the application running on top of the cloud infrastructure. Tuning these parameters based on the requirements and the availability of the resources helps organizations to achieve the desired application performance that are running in Cloud datacenters.

Input/Output Operations per Second (IOPS)

It is a measurement of read and write speed of storage devices which is based on a number of variables, such as random or sequential data patterns, the disk array configuration, the ratio of read and write operations, and the size of the data blocks.

The number of IOPS provides the performance benchmarks of the storage devices. So, it is important to be aware of the IOPS values and strive to get the best performance possible.

Cloud service providers typically offer storage devices in 3 different ranges, such as low IOPS devices, Medium/General Purpose IOPS devices and high IOPS devices. See fundamentals of Storage Infrastructure.

A server’s performance is greatly dependent on the storage device used in the server, so it is very important to choose the right storage device with right number of IOPS for the server.

On Windows OS, perfmon utility can be used to monitor the disk I/O performance and in Linux OS, iostat utility can be used to monitor the disk I/O performance.

Read vs. Write Files

Some applications are write intensive, whereas other applications are read intensive and some applications may even balance between the two.

When designing a storage solution for the server or an application, it is critical to evaluate the application’s storage behavior and determine the storage resources around those needs.

For example, RAID 0 is a great solution for both read and write operations, whereas RAID 1 has slower write performance because data has to be written twice.

Cloud service providers offers inbuilt replication & snapshots of data between multiple zones and regions by default. In addition to that you can still configure RAIDs if more read/write performance is desired.

It is very important to consider these performance parameters since there will be a trade-off between performance, cost, and other factors such as redundancy and fault tolerance.

File system Performance

Storage devices attached to the servers are generally configured as filesystems at the operating system layer and these filesystems will have different characteristics that will affect performance.

By comparing benchmarks of the various types of Windows and Linux filesystems available, you can select which one to use based on your requirements.

In Cloud Computing, the commonly used filesystems are Block based filesystem, Network based file system and Object based filesystem.

The interaction between the application and the filesystem and how the filesystem performs with the storage devices are all factors in filesystem performance.

Since the filesystem software is usually included with the operating system, it is much easier to change or modify filesystem configurations.

Metadata Performance

Metadata will provide information about files stored in the storage devices such as the size of the file, security permissions, and information about images, the file type, and its extension.

The metadata will also include where it is located on the media, the information about creating the file and when it was created, and whether it is video or audio.

Each file system will handle the metadata differently. Usually the metadata is stored on the drive as its own set of files and must be constantly updated and in sync in case there is a failure and the drive needs to be rebuilt.

Caching the metadata files in RAM will speed performance and prevent a bottleneck in system performance because the CPU will not have to wait to read and write metadata from the attached storage drive but can access it from the much faster RAM in the flash system.

Caching

Caching is a technique used to increase disk performance by using RAM pools to temporarily store data that needs to be written to, or read from, a storage drive.

When reading data from the disk, frequently used files can be accessed at much higher rates if it is stored in cache memory. The cache, using RAM, has much faster access rates and will read the data to the filesystem and prevent a disk read operation.

Also, intelligent caching has the ability to “read ahead” and anticipate future read requests based on current reads. This dramatically speeds up read times. See Compute Virtualization Basics and Fundamental concepts.

Disk performance is often the bottleneck that reduces the server performance. Instead of writing to the disk directly, a caching solution will store the data to be written to the disk in RAM and reply to the filesystem that the write operation has been completed. This will free up the CPU from waiting for the operation to complete and improve server performance.

The data will be written to the storage disk from the cache as disk access becomes available. But the risk of using cache technology is that the data is stored in volatile memory, and if there should be a failure or power outage, that data may be lost before it is written onto the storage drive.

This risk can be greatly minimized by implementing mirroring or replicating the cache to other storage devices frequently based on the RTO and RPO.

Network Bandwidth

Network bandwidth is the important cloud performance parameter and Cloud datacenters will have multiple high-speed connections to the Internet and generally are connected to multiple Internet service providers.

The bandwidth on the backbone of the Internet is therefore beyond the control of the cloud provider. They can manage the links to the Internet providers and make sure that there is enough bandwidth available to the Internet from the cloud datacenter and that the links are not saturated. However, they have no control of network bandwidth and latency after it leaves cloud datacenter.

But within the cloud datacenter, the cloud provider will generally make sure that there is ample bandwidth inside and between datacenters. With specialized datacenter switching and routing hardware, interface speeds of 10, 25, and now 100 Gbps are common.

Generally cloud service providers offer good network bandwidth between the zones in a single region and the bandwidth may be reduced between the two regions. See Network Basics and Fundamentals.

Server NIC cards can support 10 Gbps or even 25 Gbps interface speeds to handle a large number of VMs running on a hypervisor, so server LAN bottlenecks are mitigated.

By using network monitoring tools and systems, you can track link utilization and monitor trends so that you can collect capacity planning data. Based on the measured trends, additional datacenter network bandwidth can be added before link saturation occurs and causes network delays.

Jumbo Frames

The standard Ethernet frames maximum transmit unit, or MTU, is 1,518 bytes, which defines the largest Ethernet frame size that can be transmitted into the network.

Frames that are larger than the MTU are fragmented to support the standard frame size. Any Ethernet frame larger than the standard size is referred to as a jumbo frame. It is often more efficient to use a larger Ethernet frame size than the standard Ethernet MTU inside the datacenter to reduce networking overhead.

Jumbo frames allow for higher network performance by reducing the overhead in each Ethernet frame by using fewer but larger frames. Jumbo frames also reduce the number of times that a CPU will be interrupted to process Ethernet traffic since each jumbo frame can be up to six times as large as a standard frame.

Jumbo frames are now common in the cloud and enterprise datacenters and are extensively used for storage over LAN technologies such as iSCSI and Fibre Channel over Ethernet.

Cloud service providers offers few server types which usually support jumbo frames up to 9,000 bytes. So it is important to determine the right server types if jumbo frames are required.

Network Latency

Network latency is the delay incurred as a frame traverses the network. Each network device will receive the frame, compute a CRC error check, look up the exit interface, and send the frame out that interface.

There are also delays in the serialization of the frame to exit the interface. When the frame passes through a firewall, it will be compared against the firewall rule set to determine if it is permitted or denied, and the data will be delayed during this process.

If there is contention in the network, the frame may be buffered until network bandwidth becomes available.

Quality of service policies may allow higher-priority frames to be forwarded and lower-priority frames to be buffered, which can also contribute to network latency.

Network latency can vary widely in the cloud computing because of the bursty nature of LAN traffic and orchestration systems constantly adding and moving VMs and causing constant changes in network traffic flows.

Cloud networks are deployed with higher-speed interfaces with link aggregation that can reduce latency. Typically, data that is traveresed through public internet will have higher latencies when compared to the data traversing with the cloud internal network.

Network Latency can be controlled, monitored, and managed with various network monitoring tools that are available in the market to ensure that it does not affect applications or user experiences.

Network Hops

The network hop count is the number of network nodes a packet traverses in transit from its source to its destination.

The traceroute utility will show the number and names of the network devices or endpoints a packet traverses.

Although it is generally preferable to have the fewest number of hops as possible, a packet may pass through a larger number of devices because they are interconnected with higher-speed links.

Quality of service (QoS)

QoS is defined as the ability of the network to provide differentiated services based on information in the Ethernet packet. For example, voice and video traffic are real time and delay sensitive, storage traffic requires a lossless connection, and data such as mail and file transfers are not sensitive to network delays.

Using QoS, the network can be configured to take the various application needs into consideration and determine the ordering of traffic through the network.

TCP/IP headers have fields that tell the networking devices how its QoS values are configured. Routers will also be configured to look at the port numbers or IP addresses to configure QoS in a network.

The access layer switches can either honor the QoS settings in a frame or impose their own settings by modifying or stamping QoS values into the frame. Each device that the frame passes through must be configured to honor the QoS settings in the frame, so configuration can become very complex.

QoS can be controlled inside the cloud datacenter but not over the Internet since the Internet backbone is beyond your administrative control.

Load Balancing

Load Balancing is the technique which spreads the processing requests over multiple servers since it is not possible or desirable to have just one server active and servicing all the incoming requests.

A load balancer is a network device which will examine incoming traffic and allocate connections across a pool of many servers to service the connections.

Load balancers will keep track of the connections and monitor the server’s health. Should a server become overloaded or fail, intelligent decisions are made by the load balancer on reallocating the connections to other servers in the service group.

Cloud Service providers commonly offers two different types of load balancers to distribute the load across different network layers.

Network load balancers to distribute TCP and UDP traffic, Application load balancers to distribute the HTTP and HTTPS traffic.

Autoscaling

Autoscaling is increasing or decreasing the resources used by the cloud deployments. Scaling up is generally referred as Vertical scaling and scaling out is referred as horizontal scaling.

Scaling up or vertical scaling will add resources such as CPU instances or more RAM. When you scale up, you are basically increasing your compute, network, or storage capabilities. Many applications will perform better after a system has been scaled vertically.

Scaling out or horizontal scaling adds more nodes instead of increasing the power of the single node. So, with horizontal scaling you will choose to add more servers to the existing configuration.

This is a common scenario in cloud deployments as cloud uses cost-effective servers and it is easy to start and stop the servers. See Cloud Computing basic concepts and Fundamentals

With horizontal scaling, you need to run applications that are distributed. With a busy cloud website, horizontal scaling works very well by adding additional web servers to handle the additional workload and implementing a load balancer to distribute the load between the many web servers.

This arrangement is more efficient and reliable than a single server that has been scaled vertically by adding more LAN bandwidth, CPU cores, and RAM to handle the additional web load, but it is still a single point of failure.

It is important to check the cloud provider’s offerings and what options they offer when deciding to scale vertically or horizontally. The provider may offer better pricing options for multiple smaller systems than an equivalent larger system.

When deciding to use the horizontal approach, you must consider that you will need to manage more systems, and the distributed workloads between the systems may cause some latency that is not found in a single larger server.

There is also a third option, referred to as diagonal scaling, which is a combination of scaling up to more powerful systems and scaling out by deploying more of these scaled-up systems. Choosing the best approach usually comes down to each particular use case, the application capabilities, and the cloud provider’s pricing structure.

Follow Us

Basic & Fundamentals

Recent Posts

Most Read

Cloud Infrastructure Performance Tuning -Basics and Fundamentals

Input/Output Operations per Second (IOPS)

Read vs. Write Files

File system Performance

Metadata Performance

Caching

Network Bandwidth

Jumbo Frames

Network Latency

Network Hops

Quality of service (QoS)

Load Balancing

Autoscaling

You might also like to read

LEAVE A REPLY Cancel reply

About us

Cloud Computing

Architectures

Subscribe