2017-12-22

"Error: Too many open files" when inside Docker container

Does not work: various ulimit settings for daemon

We have container build from this Dockerfile, running RHEL7 with oldish docker-1.10.3-59.el7.x86_64. Containers are started with:

# for i in $( seq 500 ); do
      docker run -h "$( hostname -s )container$i.example.com" -d --tmpfs /tmp --tmpfs /run -v /sys/fs/cgroup:/sys/fs/cgroup:ro --ulimit nofile=10000:10000 r7perfsat
  done

and we have set limits for a docker service on a docker host:

# cat /etc/systemd/system/docker.service.d/limits.conf
[Service]
LimitNOFILE=10485760
LimitNPROC=10485760

but we have still seen issues with "Too many open files" inside the container. It could happen when installing package with yum (resulting into corrupted rpm database, rm -rf /var/lib/rpm/__db.00*; rpm --rebuilddb; saved it though) and when enabling service (our containers have systemd in them on purpose):

# systemctl restart osad
Error: Too many open files
# echo $?
0

Because I was stupid, I have not checked journal (in the container) in the moment when I have spotted the failure for the first time:

Dec 21 10:18:54 b08-h19-r620container247.example.com journalctl[39]: Failed to create inotify watch: Too many open files
Dec 21 10:18:54 b08-h19-r620container247.example.com systemd[1]: systemd-journal-flush.service: main process exited, code=exited, status=1/FAILURE
Dec 21 10:18:54 b08-h19-r620container247.example.com systemd[1]: inotify_init1() failed: Too many open files
Dec 21 10:18:54 b08-h19-r620container247.example.com systemd[1]: inotify_init1() failed: Too many open files

Does work: fs.inotify.max_user_instances

At the end I have ran into some issue and very last comment there had a think I have not seen before. At the end I have ended up with:

# cat /etc/sysctl.d/40-max-user-watches.conf
fs.inotify.max_user_instances=8192
fs.inotify.max_user_watches=1048576

Default on a different machine is:

# sysctl -a 2>&1 | grep fs.inotify.max_user_
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 8192

Looks like increasing fs.inotify.max_user_instances helped and our containers are stable.