Tuesday, February 14, 2017

Linux Performance Metrics & Monitoring

Rule of Thumb Metrics would be the following:

CPU Utilization

CPU usage is usually the first place we look when a server shows signs of slowing down (although more often than not, the problem is elsewhere).  The top command is arguably the most common performance-related utility in Linux when it comes to processes and CPU.  By default, top displays summary percentages for all CPUs on the system.   It is important to distinguish between two types of CPU metrics: load averages and percentages.


Load Averages

All UNIX-like systems traditionally display the CPU load as 1-minute, 5-minute and 15-minute load averages.  Essentially, the load average represents the fraction of time that the CPU is busy.  Remember that a CPU can be over-utilized – processes can be waiting for a CPU to become available, so you could see utilization rates over 1.00.  The “perfect” utilization point of 1.00 per CPU means that CPU is executing 100% of the time and no processes are waiting for a CPU to become available.  (On a machine with a single dual-core CPU that point would be 2.00, on a dual quad-core CPU – 8.00 and so on).  Of course a utilization of 1.00 per CPU would mean that there is no spare capacity to take an increased load, so most administrators are worried when they see utilization numbers consistently over 0.70.  Another command which displays CPU percentage statistics is mpstat.


Memory Usage

When a process requests the kernel to allocate memory and the system has run out of physical memory, the kernel will start paging out the least-used memory blocks to disk to free up some space, until the process that allocated them needs them back, at which point the kernel will have to find another least-used block, page it out and page in the original block in physical memory.  This mechanism means that more memory is available to applications than the physical memory installed on the server – this memory is known as virtual memory.  The good thing is that your application doesn’t even know it is using virtual memory.  But that doesn’t mean you should not keep track of memory usage because nothing is free.  Since disk access is slower than RAM access, if  your system starts paging excessively, virtual memory access will become a performance bottleneck. (A quick note: although the terms paging and swapping are often used interchangeably, strictly speaking paging refers to individual memory pages being loaded or saved to disk, and swapping – to the entire memory space of a Linux process being moved from memory to disk or vice versa).   To examine the virtual memory usage, use the vmstat command. When run without parameters, it displays a snapshot of the current state of virtual memory.  If you discover that certain applications need too much memory and your system is paging more than it should, consider installing more memory or moving them to another machine. If you do install more memory, don’t forget to increase the amount of swap space – as a rule of thumb it should be at least equal to the amount of physical memory available.


Disk Subsystems

Whenever you suspect that disk I/O activity is the bottleneck, use the iostat command with the -x switch to examine disk activity.  For obvious reasons, the I/O subsystem is the most common bottleneck on database servers and file servers. 

Some options to consider are:
  • Use faster disks. Higher RPM means faster seek time
  • Use logical volumes with striping – this way a single request can be serviced by several disks in parallel
  • Use a hardware RAID controller – avoid software RAID for data-intensive applications
  • Add more memory to allow for larger buffers









No comments: