PDA

View Full Version : Boom Crash Memory gone!


jerji
01-03-2006, 02:29 AM
Wow have I stuffed something! :oops:

I have been adding hosts (slowly) - now up to about 50 :D

I have now applied a set of changes (the last 10 hosts I have added) and the system has gone nuts! :(

I have a Debian box running Sarge, with 500Mb RAM and 1.5Gb Virtual RAM, a 2.8Ghz processor and an 80Gb HDD (no probs with capacity)

After using Oreon to generate the config files (which took about 4 minutes) I then tried to restart the nagios service (again from the Oreon interface). Took 5 minutes and then reported "Running configuration check...failed - aborting restart".

OK says me, I have stuffed something up (typical). I went back through the files in detail (both the web interface and looking at the files on the server itself). No joy. Everything looked right.

I restarted the box. No joy.

I then noticed some system messages like "order allocation failed (gfp=0x1d2/0)" Chasing this on Google indicates that I have used ALL of the memory (both physical and virtual). WOW that's a big chunk of memory!

I confirmed this while restarting Nagios and watching "free" as the memory dropped to nothing and the system aborted.

Any ideas?

jerji
01-03-2006, 02:36 AM
I have just worked out the answer and it relates to how you set up parent relationships. See my other post on hierarchies. You cannot set up a relationship like:
Core Switch 1 === Floor Switch 1 === Floor Switch 2 === Floor Switch 3 === Core Switch 2

Floor Switch 1 parents configured as Core Switch 1 AND Floor Switch 2
Floor Switch 2 parents configured as Floor Switch 1 AND Floor Switch 3
Floor Switch 3 parents configured as Floor Switch 2 AND Core Switch 2

That is the logical structure we have deployed (so that we can have redundency on the links).

This appears to give nagios a brain strain.

Hope someone (other than me) can learn from this.