Linux High System CPU

By steve, 23 February, 2012

A bunch of servers started seeing very high CPU usage in system time. The cause appeared to be related to a high number of nfs_inode_cache objects:

server:~# slabtop
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
1525524 1525524 100% 1.02K 508508 3 2034032K nfs_inode_cache
966120 856476 88% 0.19K 48306 20 193224K dentry

This was confirmed by running the following to clear the nfs_inode_cache:
server:~# sync
server:~# echo 2 > /proc/sys/vm/drop_caches

I tried to increase the size of the inode/dentry caches, since the 1.5 million inode cache entries would not fit in the 0.5 million buckets allocated by default:

server:~# dmesg |grep cache
[ 0.000699] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[ 0.005879] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)

The fix is to add the following options to the linux command line (in grub/lilo)
ihash_entries=
dhash_entries=

It looks like these values must be powers of 2, and they are limited by the amount of RAM in the server (a server with 1GB RAM is limited to 2^22 for each of these values, while a server with 8GB can have a larger number of entries).

This did not fix the problem, so I found another kernel setting which appears to have worked (after 4 days so far). The fix was to add the following to /etc/sysctl.conf:
vm.vfs_cache_pressure=100000

To fix the running system, I ran the following:
server:~# echo 100000 > /proc/sys/vm/vfs_cache_pressure
server:~# sync
server:~# echo 2 > /proc/sys/vm/drop_caches

The vfs_cache_pressure setting controls how the cache works, with numbers closer to 0 reducing the amount of memory reclaimed from cache, and numbers above 100 increase the rate (See http://www.kernel.org/doc/Documentation/sysctl/vm.txt for a more detailed description).

Comments