Announcement

Collapse
No announcement yet.

Distributed monitoring - some issues

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Distributed monitoring - some issues

    Hello all

    I have set up a distributed monitoring for my nagios prod (my dev stay with all centreon/ndo/nagios on 1 host).
    Here is my design:
    • "poller01" is a nagios host with nagios + ndomod + ndo2db
    • "ndo" is a mysql host with the ndo database. This mysql server is setup as master for the ndo database
    • "centreon" is my host with apache, php, centreon, mysql (centreon DBs + ndo as slave replication), nagios, but as disabled poller instance.

    All are VMs, on an ESXi cluster.

    I think it is a good design as my aim was to be able to extense the poller nodes as many as i need/want and also to minimize the load on the nagios poller

    What is working:
    • mysql ndo master/slave replication
    • nagios/ndomod/ndo2db stream. No error on the nagios.log file about ndo
    • centreon node -> poller scp of nagios files and nagios restart through ssh and with sudo


    What is not working / seems strange to me:
    • my nagios.log file on my poller01 host only log initial state for nagios after a start/restart. There is no other notification. I don't see the difference between my working centreondev and here. If someone can help me with advice
    • i tried to separate the ndomod/ndo2db, to have ndomod on my poller01 node and ndo2db on my ndo mysql node. It is not working. When i change the conf of ndomod to contact ndo2db on 127.0.0.1, it is working. Is there some "special" parameter to be aware of? I would like to remove the ndo2db role on my poller, to lighten it as much as i want.
    • the ndo DBs (master and slave) only populate some tables with some row when i restart my nagios process. There is no error in the sink, but it is only seems that ndomod have nothing to put. Is it normal?
    • i have for the moment a simple host/service with only 6 hosts and 6 services (1 by host) as ping. In the centreon interface i can see on the top bar my 6 services, but 0 hosts, and so nothing appear on the monitoring tab and view tab.
    • when i disable the local centreon nagios poller in centreon, i have a sudo error when generating/transferring the poller01 conf. When the local poller is enabled and i generate the conf for the poller01, there is no error.


    I can provide config file as wanted, but now, i prefer not sending (all) the files...

    If you have some advice and some direction on where to look at, i will appreciate
    Sysadmin
    OS: Ubuntu / Debian / RHEL
    Nagios env: 1 centreon 2.2.1, 5 Nagios on remote sites, NDOutil v1.4.9
    Own development: status map based on NDO, service status by categories, misc reports on nagios conf, mediawiki linked to centreon and vis-versa

  • #2
    Originally posted by clifden View Post
    Hello all
    Hi

    • my nagios.log file on my poller01 host only log initial state for nagios after a start/restart. There is no other notification. I don't see the difference between my working centreondev and here. If someone can help me with advice
    look at your nagios.cfg configuration for the poller under "Log Options" -> "Logging Options"

    • i tried to separate the ndomod/ndo2db, to have ndomod on my poller01 node and ndo2db on my ndo mysql node. It is not working. When i change the conf of ndomod to contact ndo2db on 127.0.0.1, it is working. Is there some "special" parameter to be aware of? I would like to remove the ndo2db role on my poller, to lighten it as much as i want.
    you have to change the ndomod configuration for the poller to the IP address of your ndo2db instance. but as an advice: use stunnel to encrypt the communication and restrict the access to the ndo2db TCP/IP port (typical 5668 )

    • the ndo DBs (master and slave) only populate some tables with some row when i restart my nagios process. There is no error in the sink, but it is only seems that ndomod have nothing to put. Is it normal?
    yes, there are some tables that are only populated at startup

    • i have for the moment a simple host/service with only 6 hosts and 6 services (1 by host) as ping. In the centreon interface i can see on the top bar my 6 services, but 0 hosts, and so nothing appear on the monitoring tab and view tab.
    well, thats not ok and looks like an error in the database entries or ndo.

    • when i disable the local centreon nagios poller in centreon, i have a sudo error when generating/transferring the poller01 conf. When the local poller is enabled and i generate the conf for the poller01, there is no error.
    which error exactly?

    Bye
    Frank

    Comment


    • #3
      Originally posted by frank_enser View Post
      look at your nagios.cfg configuration for the poller under "Log Options" -> "Logging Options"
      I have Notification, Service check retry, Host retry, event handler initial state and external command checked. in fact, my poller01 nagios.cfg file for log options is the same as the nagios.cfg of the centreon node (which i would like to disabed afterwards).

      you have to change the ndomod configuration for the poller to the IP address of your ndo2db instance. but as an advice: use stunnel to encrypt the communication and restrict the access to the ndo2db TCP/IP port (typical 5668 )
      In fact, centreon is not designed to push the ndo2db conf on a node that is not a poller (hosting nagios, etc...). So i think i will stay with my ndomod + ndo2db on the host that run my #1 nagios poller.

      yes, there are some tables that are only populated at startup
      The tables updated are: nagios_processevents,nagios_objects,nagios_conninf o,nagios_runtimevariables,nagios_customvariablesta tus

      well, thats not ok and looks like an error in the database entries or ndo.
      i will have a look

      which error exactly?
      In the web interface i have this:

      Code:
      usage: sudo -v [-AknS] [-p prompt]
      usage: sudo -l[l] [-AknS] [-g groupname|#gid] [-p prompt] [-U username] [-u
      username|#uid] [-g groupname|#gid] [command]
      usage: sudo [-AbEHknPS] [-C fd] [-g groupname|#gid] [-p prompt] [-u
      username|#uid] [-g groupname|#gid] [VAR=value] [-i|-s] [<command>]
      usage: sudo -e [-AknS] [-C fd] [-g groupname|#gid] [-p prompt] [-u
      username|#uid] file ...
      On my centcore.log, i have this:

      Code:
      1/7/2011 12:33:50 - Start: Send config files on poller 2 (<poller01.ip>:/datas/NAGIOS/etc/)
      1/7/2011 12:33:50 -   Command line is '/usr/bin/scp -o ConnectTimeout=5  -P 22 /datas/CENTREON/filesGeneration/nagiosCFG/2/* <poller01.ip>:/datas/NAGIOS/etc/'
      1/7/2011 12:33:51 - End: Send config files on poller 2 (<poller01.ip>:/datas/NAGIOS/etc/)
      1/7/2011 12:33:51 - Init Script : /usr/bin/ssh -t -o ConnectTimeout=5  -p 22 <poller01.ip> sudo /etc/init.d/nagios restart
      1/7/2011 12:33:51 - NAGIOS : Running configuration check...done.
      1/7/2011 12:33:51 - NAGIOS : Stopping nagios: done.
      1/7/2011 12:33:51 - NAGIOS : Starting nagios: done.
      Sysadmin
      OS: Ubuntu / Debian / RHEL
      Nagios env: 1 centreon 2.2.1, 5 Nagios on remote sites, NDOutil v1.4.9
      Own development: status map based on NDO, service status by categories, misc reports on nagios conf, mediawiki linked to centreon and vis-versa

      Comment


      • #4
        I have some improvements from my original patch that actually fixes trends-history pushes, multiple child node issues, and some speed improvements. It will require some new tables with SQL triggers and some indexes. I am pushing about 972 hosts with 125,000 items to my master node and the master node database is running 75% to 80% idle with less than 1.0 load average (use to be 4.0+ load average before improvements). It took me about 3 weeks of tuning but I was able to achieve very good results.

        Comment


        • #5
          Originally posted by hilton View Post
          I have some improvements from my original patch that actually fixes trends-history pushes, multiple child node issues, and some speed improvements. It will require some new tables with SQL triggers and some indexes. I am pushing about 972 hosts with 125,000 items to my master node and the master node database is running 75% to 80% idle with less than 1.0 load average (use to be 4.0+ load average before improvements). It took me about 3 weeks of tuning but I was able to achieve very good results.
          Hi,

          What did you do specifically to get these results?

          Comment

          Working...
          X