PDA

View Full Version : Host/Service Still "Down" after "Up"


anomaly0617
30th December 2008, 17:56
Hi all,

Firstly, if this question has been asked before, if I missed the answer in a FAQ, or if I didn't search the forum well enough before posting, please direct me to the correct post and accept my apologies. :-)

I've been testing out Centreon using a Debian-based VM image at a few sites, and I've built Centreon from scratch using the Ubuntu-HOWTO available here. With both I get the same result, so I'm thinking the problem is not a problem at all, and perhaps it's just my perception that is wrong.

Here's what I get:


I set up Hosts and/or Services for Centreon to monitor. I set the initial notification time to 1 minute and subsequent ones to once an hour (60).
I export my configuration and restart the service/daemon, and everything is happy. I see the correct number of hosts/services being monitored, and I see results "OK" appearing in the All Services and/or All Hosts monitoring screen.
A week or so later a service or host goes down for a few minutes. Centreon/Nagios catches it and sends a notification email to me. (HURRAY FOR CENTREON! - that's exactly what I wanted!)
By the time I remote in to fix the problem, the problem has resolved itself; for instance, the device in question has rebooted and come up, so there is no longer a problem... BUT
Until I acknowledge the problem in Centreon, it continues to alert me to the problem hourly, even though the problem has resolved itself... AND
After I acknowledge the problem so it will stop emailing me constantly, the DOWN Host/Service still appears in the list as being DOWN... Even though it's up. I get an email notifying me that the problem has been acknowledged, and occasionally I get a 2nd email indicating that the service state is back UP. But in either case, under Service Problems I still see that service or host entry indicating that it is or was DOWN.


Is there a way to get Centreon to delete the DOWN events after the service is back up? I have services listed there that went down and came back up weeks ago, but they still list out as DOWN in certain monitoring screens and UP in another, which (to the untrained eye) makes people wonder whether the service being monitored is really down or up.

Update: Looks like this post (http://forum.centreon.com/showthread.php?t=7188) has the same question, just with a slightly different scenario. There are no responses, though, so... Thoughts?

Thanks in Advance!
--
Anomaly0617

plight
30th December 2008, 18:43
Yep exact same problem here as in my post. Glad to see im not the only one!

plight
31st December 2008, 18:19
I tried deleteing the hosts yet the error message still stays and shows even after a refresh. So the hosts (that are no longer configured in nagios) are still stuck in "DOWN" status.

Someone has to have a clue about whats going on. I would think its a configuration error but why would nagios successfully detect a non-ok state but not be able to detect when its fixed?

netzi01
31st December 2008, 19:22
Hi,

i have also the same problem with non-existent Hosts down in the Monitoring-Section and Services who will still up, but in Centreon Monitoring down. ( Centreon 2.0 stable, Nagios 3.06 )
And no way to delete this Entry's.
In Nagios directly, this Events are not exist.

Greetings from Bavaria.
Hartmut

plight
2nd January 2009, 18:05
Made a little headway I think.

On New Years eve, i stopped the nagios service and left it off until today (off from work new years day). When I started Nagios this morning, the down hosts dissappeared.

Dont know what this means exactly... could nagios be unable to append the file it rights to create the downed host entries?

anomaly0617
5th January 2009, 15:34
It sure sounds to me like there's a maintenance script that needs to be run daily as a cron job to clean these items up. Perhaps when you stopped the nagios daemon and restarted it days later, it triggered the maintenance script? All of the above is speculation, of course. I think it would be odd if we all missed the installation of a cron script in the instructions, unless the installation instructions forgot to mention it somehow.

If we can pinpoint what happens when you stop the nagios daemon and restart it, I'd be willing to attempt writing a quick-and-dirty php or bash script that could be executed daily by cron to clean this stuff up. I'm guessing this is just a MySQL UPDATE statement.

This is really the only problem I've had with centreon, other than the lack of really good documentation to walk me through the initial setup and what order it had to happen in. I ended up piecing that together from various websites. Otherwise I think it's a stellar product, and I'd love to see it work for our needs.

anomaly0617
19th January 2009, 01:07
*Bump* Any ideas on this, guys?