No announcement yet.

Boom Crash Memory gone!

  • Filter
  • Time
  • Show
Clear All
new posts

  • Boom Crash Memory gone!

    Wow have I stuffed something! ops:

    I have been adding hosts (slowly) - now up to about 50

    I have now applied a set of changes (the last 10 hosts I have added) and the system has gone nuts!

    I have a Debian box running Sarge, with 500Mb RAM and 1.5Gb Virtual RAM, a 2.8Ghz processor and an 80Gb HDD (no probs with capacity)

    After using Oreon to generate the config files (which took about 4 minutes) I then tried to restart the nagios service (again from the Oreon interface). Took 5 minutes and then reported "Running configuration check...failed - aborting restart".

    OK says me, I have stuffed something up (typical). I went back through the files in detail (both the web interface and looking at the files on the server itself). No joy. Everything looked right.

    I restarted the box. No joy.

    I then noticed some system messages like "order allocation failed (gfp=0x1d2/0)" Chasing this on Google indicates that I have used ALL of the memory (both physical and virtual). WOW that's a big chunk of memory!

    I confirmed this while restarting Nagios and watching "free" as the memory dropped to nothing and the system aborted.

    Any ideas?


  • #2

    I have just worked out the answer and it relates to how you set up parent relationships. See my other post on hierarchies. You cannot set up a relationship like:
    Core Switch 1 === Floor Switch 1 === Floor Switch 2 === Floor Switch 3 === Core Switch 2

    Floor Switch 1 parents configured as Core Switch 1 AND Floor Switch 2
    Floor Switch 2 parents configured as Floor Switch 1 AND Floor Switch 3
    Floor Switch 3 parents configured as Floor Switch 2 AND Core Switch 2

    That is the logical structure we have deployed (so that we can have redundency on the links).

    This appears to give nagios a brain strain.

    Hope someone (other than me) can learn from this.