Check your server health

From time to time I had the problem that a server is not responding as expected. To figure out what the problem is you can use several command line tools on a Linux host that gives you an idea what is wrong. Therefore I normally install two tools on my CentOS servers:

> sudo yum install procinfo sysstat

Both tools are quite handy if you want to determine where the performance bottleneck is on your machine. Let’s start with something like this:

> clear && procinfo && sar

See an example output taken on one of my servers:

Linux 2.6.18-53.1.4.el5 (mockbuild@builder6.centos.org) (gcc 4.1.2 20070626 ) #1 SMP Fri Nov 30 00:45:55 EST 2007 4CPU [neo.int.XXXXX.XX]

Memory:      Total        Used        Free      Shared     Buffers
Mem:       8177876     8130848       47028           0       30844
Swap:      4194296         180     4194116

Bootup: Mon Jan 28 11:59:26 2008    Load average: 1.23 1.29 1.07 1/337 13383


user  :       2:41:10.65   2.4%  page in :        0
nice  :       0:00:00.04   0.0%  page out:        0
system:      11:23:19.67  10.3%  swap in :        0
idle  :   3d 23:08:26.08  86.0%  swap out:        0
steal :       0:00:00.00   0.0%
uptime:   1d  3:37:54.38         context :504866586

irq  0:  99474323 timer                 irq 12:       105 i8042
irq  1:        11 i8042                 irq 50:   1080554 libata
irq  3:        18                       irq 58: 235766212 0          0  235766
irq  4:        20                       irq169:   3127348 ioc0
irq  8:         1 rtc                   irq225:        23 uhci_hcd:usb1, uhci_
irq  9:         0 acpi                  irq233:         0 uhci_hcd:usb2

Linux 2.6.18-53.1.4.el5 (neo.int.XXXXX.XX)       29.01.2008

11:50:01          CPU     %user     %nice   %system   %iowait    %steal     %idle
12:00:01          all      1,44      0,00     17,79     38,13      0,00     42,64
12:10:01          all      1,04      0,00     13,90     38,40      0,00     46,66
12:20:01          all      2,50      0,00     14,69     39,90      0,00     42,91
12:30:01          all      1,55      0,00     15,28     36,23      0,00     46,94
12:40:01          all      1,75      0,00     17,15     36,47      0,00     44,64
12:50:01          all      1,33      0,00     16,29     37,70      0,00     44,69
13:00:02          all      1,59      0,00     16,15     36,34      0,00     45,91
13:10:01          all      1,69      0,00     16,34     35,76      0,00     46,22
13:20:01          all      1,18      0,00     15,99     37,68      0,00     45,14
13:30:02          all      1,05      0,00     13,35     38,63      0,00     46,97
13:40:01          all      2,71      0,00     18,01     31,93      0,00     47,35
13:50:01          all      3,75      0,00     15,96     32,03      0,00     48,26
14:00:01          all      2,03      0,00     16,36     33,45      0,00     48,17
14:10:01          all      3,40      0,00     18,35     33,36      0,00     44,89
14:20:02          all      2,68      0,00     16,48     35,48      0,00     45,36
14:30:01          all      1,93      0,00     15,84     34,69      0,00     47,53
14:40:01          all      0,80      0,00      6,46      8,21      0,00     84,53
14:50:01          all      1,24      0,00      6,71      3,46      0,00     88,60
15:00:01          all      1,70      0,00      5,25      3,80      0,00     89,25
15:10:01          all      1,53      0,00     11,26      4,50      0,00     82,71
15:20:01          all      0,66      0,00      2,39      1,00      0,00     95,95
15:30:01          all      0,79      0,00      7,52      1,00      0,00     90,69
Average:          all      1,74      0,00     13,52     27,18      0,00     57,55

This should give you an impression how the health of your system looks like. In this example, you can see that the server has massive %iowait problems. Also, you can see in the history that these problems were fixed sometime between 14:30h and 14:40h.

Hi folks! My name is

Thomas

Team-Lead Enterprise Applications, Linux expert, DJ & motorcycle enthusiast

Check your server health