Quantcast
Channel: Bobcares Blog Feed
Viewing all articles
Browse latest Browse all 170

How to troubleshoot high load in linux web hosting servers

$
0
0

Even in this age of high configuration servers and cloud instances, server load spikes are all too common. Getting the approach right is half the work done in troubleshooting a high server load.

In our earlier posts we covered the basics of server load, how you can monitor the server for server load here & here, and how you can improve your site loading speed. We had also posted on a very targeted load mitigation approach for cPanel servers.

Today, I will guide you on how to approach a high load situation in a linux server. This approach is a part of our internal knowledge base which is used as part of our server optimization service.


For the purposes of illustrating the approach I will describe a recent high server load troubleshooting I did on a cPanel + CentOS server. The following are the general steps:

  1. Find the over-loaded resource
  2. Find the service hogging that resource
  3. Find the virtual host over-using that service

 

1. Find the over-loaded resource

If you are troubleshooting a physical server or a hardware virtualized instance, atop is the ideal tool for you. If its a OS virtualization environment, you can as well use the regular top command. If its a load in the VPS node you are trying to troubleshoot you might as well start with vztop.

The goal is to locate which one of the resources; viz, CPU, Memory, Disk or Network is getting hogged. For my troubleshooting I chose atop as it was present in the server.

I then ran the command "atop -Aac". It showed me the accumulated usage of resources for each process, sorted automatically by the most used resource, and the command details. This gave me the below output.

atop output showing high disk usage atop output showing high disk usage

Here you see that the most used resource is disk and is marked as ADSK. From the highlighted summary I can see /dev/sda is right now 100% busy. It is worthwhile to note that the resource that is most vulnerable to over-use is usually Disk (especially if its SATA), followed by Memory, then CPU and then Network.

 

At this stage of troubleshooting, keep in mind the following points:

  1. Observe for at least 2 minutes before deciding on which resource is being hogged. The one that remains on top the most is your answer.
  2. If you are using top, use the "i" switch to see only the active processes, and "c" switch to see the full command line.
  3. Note the "%wa" in top command to see the wait average to know if its a non-cpu resource hog.
  4. Use pstree to look for any suspicious processes or unusually high number of a particular service. You can compare the process listing with a similarly loaded server to do a quick check.
  5. Use netstat to look for any suspicious connections, or too many connections from one particular IP(or IP range).

Troubleshooting is as much an exercise in invalidating possible scenarios as it is about systematically analyzing one particular possibility. When you know how various commands give an output in a normal stable server, you will gain an instinct of knowing what is NOT right.

 

2. Find the service hogging that resource

Once you identify the resource, you can use specialist programs for that particular resource to find the service that is hogging that resource.

In our current example, I continued using atop. You saw from above that mysql is the service that is automatically sorted on top of the list. Now, to get more details of disk usage I pressed "d" on the interactive screen. The output looked like below:

atop output showing disk statistics atop output showing disk statistics

Here you can see how the disk operations statistics are jumping off the normal values against the mysql processes.

You can alternatively use iotop for analyzing disk based load. The iotop output for the same server looked like below:

IOTop showing high mysql usage IOTop showing high mysql usage

From this I confirmed without doubt that it is mysql which is hogging the disk.

For checking memory you can use atop, top, or some clever use of ps.

For checking CPU usage, the best utilities are atop and top. If you are feeling a bit adventerous, try out some bash kung-fu using ps like here:

# ps -eo pcpu,pid,user,args | sort -k 1 -r | head

%CPU   PID USER     COMMAND
 9.4 29051 mysql    /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql
 8.5 28480 mysite   /usr/bin/php /home/mysite/public_html/index.php
 6.5 28493 mysite   /usr/bin/php /home/mysite/public_html/index.php
 5.0 13738 root     cxswatch - scanning
 5.0 13735 root     cxswatch - sleeping
 4.9 13737 root     cxswatch - scanning
20.7 21557 root     /bin/bash /usr/local/sbin/maldet -a /home/mydom/
 2.0 28494 root     /usr/sbin/exim -Mc 1ZaWJF-0007PK-CJ
19.2 28402 mydom    /usr/bin/php /home/mydom/public_html/index.php

For checking network usage, the best utility is nethogs It will allow you to map a process ID to a high network usage. Apart from atop armed with netatop module, I haven't seen any other utility do that.

 

3. Find the virtual host over-using that service

Now that you know which service to investigate into, focus on service specific troubleshooting. For mysql, we have several good tools, one of which is mtop. I have found mytop also to be quite good. In our example here, we have mtop loaded in the server and gave the below output:

mtop output showing comment spamming mtop output showing top database connections

Following about 2 minutes of observation I saw that the user "ferc" is very busy in using his database. A follow-up check of his access log showed me that his comments section is getting hammered by spam bots because his captcha was broken. Also, he had opted to not use mod_security which made his site vulnerable to spam bots. So, it was quickly rectified by enabling mod_security protection in his site, and the load started coming down.

The other services which I have noted to be taking load are backup processes, server maintenance processes like tmpwatch, update scripts, IMAP server, Apache and sometimes SMTP server through inbound spamming.

The best place to start service specific troubleshooting is at their individual access logs. By increasing the log verbosity if needed, you will get the virtual host which is taxing that particular service. If its not an internal maintenance process that is inducing the load, a very good option will be to use tshark or tcpdump to log which virtual host is getting all the connection requests on the port of that particular service.

 

Take-away from this post

 

  1. It is important to be disciplined in your approach to troubleshooting. Follow the three step process to walk down to the specific virtual host.
  2. In troubleshooting, knowing what is NOT causing the issue is as important as following a thread to trace what is causing it. Having a habit of frequently checking all command outputs in a normal server will give you the power to immediately see what is wrong.
  3. There are specialist tools to use in different situations. Developing a curiosity to explore new and better utilities will stand in good stead when you face an emergency.

 

Happy hacking! :)

 

Note : This article was originally published on 17th August 2013, and was revised on 14th September 2015.


About the author

is a senior software engineer at Bobcares. He has extensive experience in managing technical support teams of web hosting companies and data centers. He is passionate about systems engineering, and loves to get his hands dirty on systems automation. His free time is spent reading books and being with his family.


 

Bobcares server administrators routinely help webmasters and service providers secure and optimize their web server infrastructure. Our server management services cover 24/7 monitoring, emergency administration, periodic security hardening, periodic performance tuning and server updates.

SEE HOW YOU CAN KEEP YOUR SERVERS HARDENED AND SECURE


Viewing all articles
Browse latest Browse all 170

Trending Articles