Intense LDAP Usage - High CPU Load and Network Utilization

Hi,

We’ve had an Openfire (3.6.4) instance (backed by LDAP–this becomes important) running for well over a year now, serving somewhere between 200 and 350 clients at any given time. Over the past two months or so though, we’ve seen a few instances where all clients will drop their connections within seconds of each other, and then try to reconnect. This has happened enough that we’ve spent a fair amount of time trying to figure out what’s going on, and even went so far as to move our Openfire system from a virtual machine to a physical box (2.4 GHz quad-core, 4 GB RAM, 146 GB 15,000 RPM SAS drives). In the end though, this didn’t fix the problem.

We noticed a few things that were a bit odd:

  • Bandwidth usage on the Openfire system was through the roof (~45-50 megabits/second), relatively speaking, and most of that traffic was LDAP traffic

  • Openfire drops coincided with times where load on the LDAP server was higher than usual

  • Said load on the LDAP server was mostly due to I/O wait

In short, the problem seemed to be that the storage backing our LDAP server is relatively slow, and during these high load periods, I/O wait was unacceptable. Increasing RAM and turning LDAP logging nearly off seems to have remedied the issue where Openfire would drop connections, but we’re still seeing the large amounts of traffic between Openfire and LDAP, and I can’t quite figure out why. Samples of our LDAP logs (from before we turned logging down) follow:

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=0 SRCH base=“ou=Web Applications,dc=example,dc=com” scope=2 deref=3 filter="(&(uid=*)(objectClass=inetOrgPerson))"

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=0 SRCH attr=uid

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=0 SEARCH RESULT tag=101 err=0 nentries=1460 text=

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=1 SRCH base=“ou=Web Applications,dc=example,dc=com” scope=2 deref=3 filter="(&(uid=*)(objectClass=inetOrgPerson))"

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=1 SRCH attr=uid

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=1 SEARCH RESULT tag=101 err=0 nentries=1460 text=

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=2 SRCH base=“ou=Web Applications,dc=example,dc=com” scope=2 deref=3 filter="(&(uid=*)(objectClass=inetOrgPerson))"

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=2 SRCH attr=uid

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=2 SEARCH RESULT tag=101 err=0 nentries=1460 text=

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=3 SRCH base=“ou=Web Applications,dc=example,dc=com” scope=2 deref=3 filter="(&(uid=*)(objectClass=inetOrgPerson))"

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=3 SRCH attr=uid

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=3 SEARCH RESULT tag=101 err=0 nentries=1460 text=

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=4 SRCH base=“ou=Web Applications,dc=example,dc=com” scope=2 deref=3 filter="(&(uid=*)(objectClass=inetOrgPerson))"

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=4 SRCH attr=uid

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=4 SEARCH RESULT tag=101 err=0 nentries=1460 text=

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=5 SRCH base=“ou=Web Applications,dc=example,dc=com” scope=2 deref=3 filter="(&(uid=*)(objectClass=inetOrgPerson))"

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=5 SRCH attr=uid

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=5 SEARCH RESULT tag=101 err=0 nentries=1460 text=

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=6 SRCH base=“ou=Web Applications,dc=example,dc=com” scope=2 deref=3 filter="(&(uid=*)(objectClass=inetOrgPerson))"

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=6 SRCH attr=uid

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=6 SEARCH RESULT tag=101 err=0 nentries=1460 text=

Jul 11 12:32:12 ldap slapd[1577]: conn=2363994 op=7 SRCH base=“ou=Web Applications,dc=example,dc=com” scope=2 deref=3 filter="(&(uid=*)(objectClass=inetOrgPerson))"

Seeing queries this large multiple times a second just doesn’t seem normal to me. Upgrading to 3.7.0 is on my list of things to try, but I’d rather not guess at this point. If anyone has any thoughts to share that might explain the large number of LDAP queries (and corresponding bandwidth utilization), it’d be much appreciated.

1 Like

Hi,

I have same problem. when i use wildfire 3.2.4 with the same LDAP configurations, it works correctly, and the CPU dont use more that 3 % of power, but when i use openfire (last release) the openfire server and the ldap server uses the maximum of CPU power, and the openfire server exhausts the java memory, using more that 2 Gb of memory, on the wildfire 3.2.4 the use of memory memory dont exceeds 150 Mb, whith 500 users.

I have sought by help in various foruns, but, unsuccessfully, and is not possible that the differences of wildfire and openfire have changed both.

I use the ldap configuration with postgres Database

Would appreciate if someone can help me…

1 Like

Same here!