PDA

View Full Version : Issue with satelite ( ssh)


froggy
11th December 2008, 03:41
I am having issues setting up a satelite monitor and am hoping someone can assist,
The issue from what i can tell starts with the ssh connect and ends with various other symptoms . The one that gives the best hint to me was whenever I try and generate the configuration for the satilite server I get the following error in the central servers /var/log/messages

sudo: wwwrun : pam_authenticate: User not known to the underlying authentication module ; TTY=unknown ; PWD=/usr/local/centreon/www ; USER=root ; COMMAND=/usr/local/nagios/bin/nagios -v /usr/local/centreon/filesGeneration/nagiosCFG/2/nagiosCFG.DEBUG

and in the apache error log I get

root's password:
sudo: pam_authenticate: User not known to the underlying authentication module




On the satalite I don't actually get anything in the logs at all when I generate the changes so it doesn't get that far, however it does show the nagios user making connections but failing cycleing every few minitues or so
the key phase passes , but then it still recives a pam error ( as if I was being prompted for a password and failign when i didn't enter one ) this error says

"sshd[29910]: Accepted publickey for nagios from 10.1.0.121 port 5830 ssh2
sshd[29912]: PAM audit_log_acct_message() failed: Operation not permitted"

All doesn't seem to be completely broken though so I think it is just a small issue that is causing my greif, to give a bit of background info

Things I can do as expected
tail -f /var/lib/centreon/log/2/nagios.log on the central server updates with errors and notifications as expected from the satalite
I can su - nagios and log into the satilite as nagios without a password
I can log into the central mysql server as the set user from both server locations as expected

The thing thing that I can't do that I really want to sort out is the sshd error I get from communication done between the servers I belive once I can get this ironed out and have copied my configuration files to the new server that
ethe other small problems will likely fall into place or not far off

anyone have any clue's as to where I can continue on my trouble shooting ( perhaps permissions ? ) is there a way to change centreon to use the nagios user and not the wwwrun user ?

naparuba
11th December 2008, 09:03
I think your problem is on the main server, not the satellite. Can you check your /etc/sudoers file? Is apache run with the wwwrun user?

froggy
11th December 2008, 12:22
Hi ,

Thankyou for your responce , yes i think the issue is comming from the central server end aswell the relivant part of the sudoers file is as follows , I did both nagios and wwwrun untill I have this issue sorted soat least i know I am dealing with one thing at a time


nagios ALL=NOPASSWD: /etc/init.d/nagios restart
nagios ALL=NOPASSWD: /etc/init.d/nagios stop
nagios ALL=NOPASSWD: /etc/init.d/nagios start
nagios ALL=NOPASSWD: /etc/init.d/nagios reload
nagios ALL=NOPASSWD: /usr/sbin/nagiostats
nagios ALL=NOPASSWD: /usr/sbin/nagios *
wwwrun ALL=NOPASSWD: /etc/init.d/nagios restart
wwwrun ALL=NOPASSWD: /etc/init.d/nagios stop
wwwrun ALL=NOPASSWD: /etc/init.d/nagios start
wwwrun ALL=NOPASSWD: /etc/init.d/nagios reload
wwwrun ALL=NOPASSWD: /usr/sbin/nagiostats
wwwrun ALL=NOPASSWD: /usr/sbin/nagios *


Bit of an update since my first post aswell , I can now monitor the satilite system from my centreon WebUI ( not the nagios WebUI though it just has the central server items listed but I expect thats normal) , however I am still unable to copy over the config files using the WebUI and I am still reciving the ssh errors on both systems

Atm I can use scp to copy over all the configs from the command line and that works as the nagios user with the key and no password.

The important thing for is that I can at least get started with my other Vlans whilst I iron out what i hope ends up to be a little bug or config change needed

froggy
15th December 2008, 03:39
Still having some issues with the communication between my central server and satilite servers

I have narrowed it down to a centreon specific issue ( I belive ) and am hoping someone can help push me in the right direction to get this sorted out

First the good news .... I Have setup 8 satilites which are all reporting backcorrectly to the main server and pretty much work as expected with the exception of not being able to copy files to the satilites and or restart their services. If I look at the messages log or the sshd log on any of the satilites though I can see that the nagios user is being authentificated successfully without a password many times over , indicating to me that the nagios user can ssh between the machines fine during general commmunication I am also reciving all of the status and alert information from the satilites sothatsgood news aswell :) , However once I try and use the WebUI to copy the satilite configs or restart the service I get a pam_authentification error in both the apache2 and messages log's

Localhost works fine after the Sudoer edit ( adding wwwrun) which was stopping services starting and stoping as expected from the WebUI , but when it comes to the satilites it creates all the files to copy over and then seems to be trying to connect to the satilites as wwwrun instead of as nagios which ofcourse fails fails do to a missing password.

It's not a show stopper by anymeans as I have what amounts to a working monitoring system with a few quirks , however in my experiance comms issues that start off tiny can very easly end up being big issues if not dealt with. It will also make any issues I come across in the future related or not harder to troubleshoot.
If anyone could shed some light on what I may have missed i would really apreciate it. I am sure it's something others would have had to have come accross as I only followed the instructions in the wiki like I imagine most other people did and ended up with 8 systems with the same symptoms

naparuba
15th December 2008, 09:11
The connexion between centreon and centcore is done with a pipe. Centcore read the pipe, the command into it, and then send the files everywhere. Check the centcore configuration, the error must be in it.