For a long time I’ve been thinking that load average numbers that are produced by top command output truely represent server load. My first idea was that these numbers show me the percentage of relative server load, combining CPU usage, Memory usage, etc. Somebody even told me that this parameter should not exceed 3. If it is higher than 3, your server must be overloaded. But once I saw a server that had load average characteristics at 100 and above and it continued to serve users. Then I decided to investigate what top outputs and how to find out the real server load value.
top values represent the number of blocking processes in the run queue averaged over a certain time period. Time periods are 1 minute, 5 minutes and 15 minutes respectively. You cannot rely on them when trying to discover your server load. Your server can remain stable even at high load average values.
But just take a look at the parameters below, that are memory and CPU usage. CPU usage should be something like this:
Cpu(s): 75.6% us, 15.3% sy, 0.0% ni, 6.8% id, 0.2% wa, 0.7% hi, 1.5% si.
What do these numbers mean? First 3 of then tell you about current server time usage. If you CPU is overloaded, at least one of these values should be near 100%. Also pay attention to the value selected in bold. This value represents the time your server CPU is waiting for Input/Output (I/O) operations. Values above 80% can indicate hardware failure or improper use of your applications.
In order to detect which application is loading your server issue the following command from your command line: ps faux | more . This will show you active processes and will help you to determine the problem.