I'm using the new ods function which is logging to rrd only. THe problem is, whenever i restart nagios i get a break in my graphs. Does anyone else have a similiar problem?
Announcement
Collapse
No announcement yet.
break in ods graphs on server restart
Collapse
X
-
The problem is that rrd database are configure to have a defined number of datas, with a defined interval (based on your service configuration => normal check interval...).
So, if Nagios doesn't get back a value for your service (restart, latency in check...) the rrd database is not fill, and if the datas arrives for the next counter, and the previous is empty => break.
So, tune your Nagios configuration in order to record state of each service, scheduling infos, service & host check spread...
-
ODS bug?
There are a lot of breaks in my ODS graphs too and I haven't restarted Nagios the last days.
I think there is a bug in ODS.
Last night one of my internet-routers was down (CRITICAL) from 0:15 to 8:15 and to my surprise I miss all the ODS graphs info exactly from 0:15 to 8:15 for ALL my services.
And these services have nothing to do with this internet-router..
Going to keep an eye on this.
Regards
Menno van Bennekom
Comment
-
Originally posted by DonKiShoot View PostWhat is your check frequency ?
300 sec or 5mn seems to be the best choice for having good graph.
Hope it's help you. :wink:
Regards
Menno
Comment
-
still breaks
These breaks in the ODS-graphs are still a problem to me, I can't find a relationship with other occurances, only that it seems to happen more often when some service is critical or down. The breaks don't happen at the same time in all graphs, each has different breaks at different times.
Is nobody else having this problem?
Random example attached.
Regards
MennoAttached Files
Comment
-
ODS RRD files incorrect heartbeat
I finally found the cause for the holes in my ODS graphs
The measurements were stored correctly in ODS-mysql, in the perfdata-file, and in the check_graph_traffic RRD file, but sometimes not in the ODS-RRD file.
With a dump of the ODS RRD file (rrdtool dump) I saw some measurements appear with value 'NaN', not-a-number.
Then with rrdtool info you can see that the RRD is created with step 300 (5 minutes) but the metric is created with a heartbeat of 300 too.
That is too strict normally, because if the measurement comes in after 301 seconds it already gets a 'NaN' value, you HAVE to respond within 300 seconds..
So I changed all the RRD files to a heartbeat of 600:
Code:cd /usr/local/oreon/OreonDataStorage for f in *.rrd; do rrdtool tune $f --heartbeat metric:600; done
but can be different for each file!!
Since that moment the holes/breaks in the graphs are gone.
I think this should be adapted in ODS/lib/updateFunctions.pm.
In this program the RRD's are created with step and heartbeat as the same parameter ($interval).
Update history:
The value of $interval is changed by the patches on updateFunctions.pm:
original release: $interval = $interval * $data->{'interval_length'} ;
first patch: $interval = $interval * $data->{'interval_length'} * 2;
fourth patch: $interval = $interval * $data->{'interval_length'} + 10;
But I think the $interval value of the original release is the good one, just the line where the RRD is created should be changed, step and heartbeat should not be the same:
Code:was: RRDs::create ($_[0]."/".$_[1].".rrd", "-b ".$begin, "-s ".$interval, "DS:metric:GAUGE:".$interval.":U:U", "RRA:AVERAGE:0.5:1:".$_[5], "RRA:MIN:0.5:12:".$_[5], "RRA:MAX:0.5:12:".$_[5]); my suggestion: RRDs::create ($_[0]."/".$_[1].".rrd", "-b ".$begin, "-s ".$interval, "DS:metric:GAUGE:". $interval * 2 .":U:U", "RRA:AVERAGE:0.5:1:".$_[5], "RRA:MIN:0.5:12:".$_[5], "RRA:MAX:0.5:12:".$_[5]);
Menno van Bennekom
Comment
-
Julio si il a raison, c'est une sacré bourde, le heartbeat est toujours égal au double du step par principe, non ?
Enfin on a toujours fait comme-ça pour les check_graph il me semble ?
Cela expliquerait pourquoi les nagios non optimisé ont des courbes toute pourries à cause de leur latency trop élevé et d'un heartbeat trop court.Intel(R) Xeon(TM) CPU 3.4GHz - MemTotal : 1034476 kB
Centreon 2.4.1 - Nagios 3.2.1 - Nagios Plugins 1.4.15 - Manubulon Plugins tuné
Fedora Core 5 - 2.6.20-1.2320
Comment
-
Je confirme !
Après avoir modifier le fichier updatefunctions.pm je n'ai plus aucun trou dans mes graphiques.
Ce que j'ai mis en place :
Je n'ai pas touché à la variable $interval (ligne 49 & 99)
Code:$interval = $interval * $data->{'interval_length'} + 10;
Code:$interval_hb = $interval * 2;
Code:RRDs::create ($_[0].$_[1].".rrd", "-b ".$begin, "-s ".$interval, "DS:".substr($_[6], 0, 19).":GAUGE:".$interval_hb.":U:U", "RRA:AVERAGE:0.5:1:".$nb_value, "RRA:MIN:0.5:12:".$nb_value, "RRA:MAX:0.5:12:".$nb_value);
Code:writeLogFile("Creating $_[0]$_[1].rrd -b $begin, -s $interval, DS:".substr($_[6], 0, 19).":GAUGE:$interval_hb:U:U RRA:AVERAGE:0.5:1:$nb_value RRA:MIN:0.5:12:$nb_value RRA:MAX:0.5:12:$nb_value\n");
Code:rrdtool tune [nom_du_rrd].rrd --heartbeat [valeur_metrics]:[valeur_heartbeat]
Un problème reste présent : lors de la regénération du rrd (par centreon) le heartbeat est remis à la même valeur que le step. Si quelqu'un sait quel fichier modifier... je suis preneur.
Comment
-
etrange quand meme que tobi n'ai pas intégré la possibilité de changer le heartbeat a la creation.....
Comment
-
Alors pas de nouvelle ?
Parce que le bug est plutot génant... surtout quand on a beaucoup de graphs et qu'il se mettent à foirer du jour au lendemain sans qu'aucune action n'est été faite et qu'on est obligé de regénérer une bonne cinquantaine de graphs (j'aime bien la commande rrdtool mais bon...) !
Après avoir regénéré les rrds des graphs en question, je me retrouve avec des "jolies" trous d'une semaine.....
J'ai vraiment de gros doutes sur la fiabilité des vues oreon.....Last edited by rzd; 2 October 2007, 14:18.
Comment
Comment