nagios 安装与部署——————
1、安装前准备
(1)创建nagios用户和用户组
[root@localhost ~]#groupadd nagios
useradd nagios
useradd -G nagios nagios
usermod -G nagios apache
[root@localhost ~]#mkdir /usr/local/nagios
[root@localhost ~]#chown -R nagios.nagios /usr/local/nagios
(2)开启系统sendmail服务
在nagios监控服务器上开启sendmail服务的主要作用是让nagios在检测到故障时可以发送报警邮件,
目前几乎所有的linux发行版本都默认自带了sendmail服务,所以,在安装系统时只需开启sendmail服务即可,
并且不需要在sendmail上做任何配置。
(3) 安装Apache+php
yum install httpd httpd-devel php php-mysql php-common php-gd php-mbstring php-mcrypt php-devel php-xml gcc glibc glibc-common gd gd-devel openssl openssl-devel
-----------------------------------------------------------------
2、 编译安装Nagios
nagios下载:wget http://sourceforge.net/projects/nagios/?source=directory
[root@localhost ~]# tar -zxvf nagios-3.2.0.tar.gz
[root@localhost ~]# cd nagios-3.2.0
[root@localhost nagios-3.2.0]#./configure --prefix=/usr/local/nagios --with-command-group=nagios
#指定nagios的安装目录,这里指定nagios安装到/usr/local/nagios目录
[root@localhost nagios-3.2.0]#make all
[root@localhost nagios-3.2.0]#make install
# make install用来安装nagios的主程序,CGI和HTML文件
[root@localhost nagios-3.2.0]# make install-init
#通过make install-init命令可以在/etc/rc.d/init.d目录下创建nagios启动脚本
[root@localhost nagios-3.2.0]# make install-commandmode
#通过make install-commandmode命令来配置目录权限
[root@localhost nagios-3.2.0]# make install-config
#make install-cofig命令用来安装nagios示例配置文件,这里安装的路径是/usr/local/nagios/etc
[root@localhost nagios-3.2.0]#make install-webconf
--------------------------------------------------------------------
3、Nagios目录介绍
Nagios安装完成后,各个目录结构以及功能说明如下表所示:
## bin--nagios 可执行程序所在目录
## etc--nagios 配置文件所在目录
## sbin--nagios cgi 文件所在目录,也就是执行外部命令所需文件所在的目录
## share--nagios 网页文件所在的目录
## libexec--nagios 外部插件所在的目录
## var--nagios 日志文件、lock等文件所在的目录
## var/archives-- nagios日志自动归档目录
## var/rw--用来存放外部命令文件的目录
-----------------------------------------------------------------------
4、 安装Nagios插件
这里下载的版本是nagios-plugins-1.4.14。
注意:插件版本与nagios版本的关联并不大。
[root@localhost nagios]#tar -zxvf nagios-plugins-1.4.14.tar.gz
[root@localhost nagios]#cd nagios-plugins-1.4.14
[root@localhost nagios-plugins-1.4.14]#./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios
[root@localhost nagios-plugins-1.4.14]# make
[root@localhost nagios-plugins-1.4.14]# make install
安装完成,在/usr/local/nagios下的libexec目录下,生成很多可执行文件,
这些正是nagios所需要的插件。
————————————————————————————————————————————————————
--------------------------------------------------------------------------
—|———————(建议不使用中文汉化版)——————
|
此|5、安装Nagios中文化插件
| 中文插件下载地址:
段| http://sourceforge.net/projects/nagios-cn/files/
|下载对应nagios版本的中文插件,然后开始安装:
内|[root@localhost ~]#tar xvfj nagios-cn-3.2.0.tar.bz2
|[root@localhost nagios-cn-3.2.0]#cd nagios-cn-3.2.0
容|[root@localhost nagios-cn-3.2.0]#./configure
|[root@localhost nagios-cn-3.2.0]#make all
仅|[root@localhost nagios-cn-3.2.0]#make install
|
供|-------------------------------------------------------------------
|
参|6、安装与配置apache和php
| apache和php不是安装nagios所必须的,但是nagios提供了web监控界面,
考| 通过web监控界面可以清晰的看到被监控主机、资源的运行状态,因此,安装
| 一个web服务是很必要的。
。| 需要注意的是,nagios在nagios3.1.x版本以后,配置web监控界面时需要
| php的支持。这里我们下载的nagios版本为nagios-3.2.0,因此在编译安装完
不| 成apache后,还需要编译php模块,这里选取的php版本为php5.3.2。
|
是| (1)安装apache与php
| 首先安装apache,步骤如下:
配| [root@nagiosserver ~]# tar zxvf httpd-2.0.63.tar.gz
| [root@nagiosserver ~]#cd httpd-2.0.63
置| [root@nagiosserver ~]#./configure --prefix=/usr/local/apache2
| [root@nagiosserver ~]#make
必| [root@nagiosserver ~]#make install
| 接着安装php,步骤如下:
需| [root@nagiosserver ~]# tar zxvf php-5.3.2.tar.gz
| [root@nagiosserver ~]#cd php-5.3.2
项| [root@nagiosserver ~]#./configure --prefix=/usr/local/php --with- | apxs2=/usr/local/apache2/bin/apxs
| [root@nagiosserver ~]#make
| [root@nagiosserver ~]#make install
| 从安装步骤可知,apache安装路径为/usr/local/apache2,而php安装路径为/usr/local/php。
|(2)配置apache
| 找到apache配置文件/usr/local/apache2/conf/httpd.conf
| 找到:
| User nobody
| Group #-1
| 修改为
| User nagios
| Group nagios
| 然后找到
| DirectoryIndex index.html index.html.var
| 修改为
| DirectoryIndex index.html index.php
| 接着增加如下内容:
| AddType application/x-httpd-php .php
|
| ######为了安全其间,一般情况下要让nagios的web监控界面必须经过授权才能访问,
| 这需要增加验证配置,即在httpd.conf文件最后添加如下信息:
|
| #setting for nagios
| ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"
| <Directory "/usr/local/nagios/sbin">
| AuthType Basic
| Options ExecCGI
| AllowOverride None
| Order allow,deny
| Allow from all
| AuthName "Nagios Access"
| AuthUserFile /usr/local/nagios/etc/htpasswd
| Require valid-user
| </Directory>
|
|
| Alias /nagios "/usr/local/nagios/share"
| <Directory "/usr/local/nagios/share">
| AuthType Basic
| Options None
| AllowOverride None
| Order allow,deny
| Allow from all
| AuthName "nagios Access"
| AuthUserFile /usr/local/nagios/etc/htpasswd
| Require valid-user
| </Directory>
—|#############
————————————————————————————————————————————————————————
————————————————————————————————————————————————————————
(3)创建apache目录验证文件
在上面的配置中,指定了目录验证文件htpasswd,下面要创建这个文件:
[root@localhost nagios]#/usr/local/apache2/bin/htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
New password: (输入密码)
Re-type new password: (再输入一次密码)
Adding password for user nagiosadmin
这样就在/usr/local/nagios/etc目录下创建了一个htpasswd验证文件,当通过http://ip/nagios/
访问时就需要输入用户名和密码了。
最后,启动服务:
[root@ nagiosserver ~]#service httpd restart
----------------------------------------------------------------------------------
1、nagios默认配置文件介绍
nagios安装完毕后,默认的配置文件在/usr/local/nagios/etc目录下,每个文件或目录含义
如下表所示:
cgi.cgf--控制cgi访问的配置文件
nagios.cfg--nagios 主配置文件
resource.cfg--变量定义文件,或者叫资源文件,通过在此文件中定义变量,以便让其他配置文件引用,如 $USER1$
objetcs--objetcs是一个目录,在此目录下有很多配置文件模版,用于定义Nagios对象
objetcs/commands.cfg 命令定义配置文件,里面定义的命令可以被其他配置文件引用
objetcs/contacts.cfg 定义联系人和联系组的配置文件
objetcs/localhost.cfg 定义监控本地主机的配置文件
objetcs/printer.cfg 定义监控打印机的一个配置文件模版,默认没有启用此文件
objetcs/switch.cfg 监控路由器的一个配置文件模版,默认没有启用此文件
objetcs/templates.cfg 定义主机、服务的一个模版配置文件,可以在其他配置文件中引用
objetcs/timeperiods.cfg 定义nagios 监控时间段的配置文件
objetcs/windows.cfg 控windows 主机的一个配置文件模版,默认没有启用此文件
---------------------------------------------------------------------------------
2、配置文件之间的关系
在nagios的配置过程中涉及到的几个定义有:主机、主机组,服务、服务组,联系人、
联系人组,监控时间,监控命令等,从这些定义可以看出,nagios各个配置文件之间是互为关联,
彼此引用的。
成功配置出一台nagios监控系统,必须要弄清楚每个配置文件之间依赖与被依赖的关系,
最重要的有四点:
第一:定义监控哪些主机、主机组、服务和服务组
第二:定义这个监控要用什么命令实现,
第三:定义监控的时间段,
第四:定义主机或服务出现问题时要通知的联系人和联系人组。
----------------------------------------------------------------------------------
3、开始配置nagios
为了能更清楚的说明问题,同时也为了维护方便,建议将nagios各个定义对象创建独立的配置文件:
即为:
|-创建hosts.cfg文件来定义主机和主机组 ——|
|-创建services.cfg文件来定义服务 ——|(这两个文件也可创建在一起)
用默认的contacts.cfg文件来定义联系人和联系人组
用默认的commands.cfg文件来定义命令
用默认的timeperiods.cfg来定义监控时间段
用默认的templates.cfg文件作为资源引用文件
——————————————————————————————————————————
例: templates.cfg文件
nagios主要用于监控主机资源以及服务,在nagios配置中称为对象,为了不必重复定义
一些监控对象,Nagios引入了一个模板配置文件,将一些共性的属性定义成模板,以便于
多次引用。这就是templates.cfg的作用。
下面详细介绍下templates.cfg文件中每个参数的含义:
define contact{
name generic-contact ##联系人名称
service_notification_period 24x7 ##当服务出现异常时,发送通知的时间段,这个时间段“7*24”在timeperiods.cfg 文件中定义
host_notification_period 24x7 ##当主机出现异常时,发送通知的时间段。
service_notification_options w,u,c,r,f,s ##定义“通知可以被发出的情况”。w即warn,表示警告。u即unknown表示不明状态。c即criticle,表示紧急状态。r即recover,表示恢复状态。也就是在服务器出现警告、未知、紧急、重新恢复状态时都发送通知给使用者。
host_notification_options d,u,r,f,s ##定义主机在什么状态下需要发送通知给使用者,d即down,表示宕机状态。u即unreachable,表示不可达状态。r即recovery,表示重新恢复状态
service_notification_commands notify-service-by-email ## 服务故障时,发送通知邮件和短信。这里发送的是邮件。其中 “notiify-service-by-email”在command.cfg文件中定义
host_notification_commands notify-host-by-email ##主机故障时,发送通知的方式,可以是邮箱和短信。
register 0
}
define host{
name generic-host ##主机名称,这里的主机名,并不是直接对应到真正机器的主机名,乃是对应到在主机配置文件所设定的主机名
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_period 24x7 ##指定“发送 通知”的时间段,也就是可以在什么时候发送通知给使用者
register 0
}
——————————————————————————————————————————————
添加 主控端 nrep 插件
http://jaist.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.13/nrpe-2.13.tar.gz
tar zxvf nrpe-2.13.tar.gz
cd nrpe-2.13
./configure
make all
make install-plugin
vi /usr/local/nagios/etc/objects/commands.cfg 添加
define command {
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
检查配置文件是否正确
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
#############################################################
添加监控主机:——
被监控端主机上操作:
添加用户和组:
useradd -s /sbin/nologin nagios
安装nagios插件:
http://7.down.119g.com:7766/7/52DB48B15572B98C6FCD8AAEC2EF4D2AAD7640D3/nagios-plugins-1.4.16.tar.gz
tar zxvf nagios-plugins-1.4.14.tar.gz
cd nagios-plugins-1.4.14
./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios
make && make install
http://jaist.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.13/nrpe-2.13.tar.gz
tar zxvf nrpe-2.13.tar.gz
cd nrpe-2.13
./configure
make all
make install-plugin
make install-daemon
make install-daemon-config
make install-xinetd
创建 nrpe 运行脚本:
vim nrpe.sh
killall -9 nrpe
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
添加权限: chmod 777 nrpe.sh
编辑 nrpe.cgi
vim /usr/local/nagios/etc/nrpe.cfg
将allowed_hosts= 后面加上主控端的ip地址:如:192.168.34.105(多个地址需用逗号“,”隔开,不用添加空格)
server_port=5666 为 nrpe 服务监控端口号
添加监控项:
command[check_users]=/usr/local/nagios/libexec/check_users -w 10 -c 20
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sd1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda6
command[check_sd2]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda3
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_cpu]=/usr/local/nagios/libexec/check_cpu.sh -w 50 -c 80
command[check_mem]=/usr/local/nagios/libexec/check_mem.sh -w 50 -c 80
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_ip_connets]=/usr/local/nagios/libexec/ip_conn.sh 300 50
command[check_server]=/usr/local/nagios/libexec/check_server.sh
command[check_heartbeat]=/usr/local/nagios/libexec/check_heartbeat.sh
command[check_df]=/usr/local/nagios/libexec/check_disk -x /dev/shm -w 15% -c 5%
command[check_megaraid_sas]=/usr/local/nagios/libexec/check_megaraid_sas (监控“RAID信息”需下载check_megaraid_sas脚本具体操作步骤如下)
————————————————————————————————————————————
在需要被监控主机上进行以下操作:
1、 a. 查看服务器类型:dmidecode -s system-product-name
b. 检查RAID卡信息:dmesg | grep RAID
c. 确认是否已经安装工具:
[root@localhost ~]# rpm -qa | egrep 'Lib_Utils|MegaCli'
Lib_Utils-1.00-09.noarch
MegaCli-8.02.21-1.noarch
如果还没有安装,建议下载安装使用最新的MegaCli, 这样就支持更多的SAS硬盘类型的监控。安装完成后,如何正确安装,执行MegaCli会有如下提示:
[root@localhost ~]# MegaCli
Fatal error - Command Tool invoked with wrong parameters
Exit Code: 0x01
d. 使用MegaCli查看相关信息(not necessary)
# MegaCli -help (查看命令帮助)
# MegaCli -adpCount (查看适配器个数)
#MegaCli -LdGetNum -aALL (查看逻辑盘个数)
# MegaCli -LdInfo -LALL -aAll (显示所有逻辑盘信息)
2、脚本安装
下载check_megaraid_sas脚本,该脚本通过MegaCli命令来获取监控信息的Nagios插件, 使用perl编写的。
下载地址: http://www.techno-obscura.com/~delgado/code/check_megaraid_sas
修改该脚本内容:
# vi check_megaraid_sas
a. 查找第35行:
use lib qw(/usr/lib/nagios/plugins /usr/lib64/nagios/plugins); # possible pathes to your Nagios plugins and utils.pm
修改为:
use lib qw(/usr/local/nagios/libexec); # possible pathes to your Nagios plugins and utils.pm
说明:/usr/local/nagios/libexec 为nrpe 在监控端主机上的路径。
b. 查找第52-53行:
my $megaclibin = '/usr/sbin/MegaCli'; # the full path to your MegaCli binary
my $megacli = "sudo $megaclibin"; # how we actually call MegaCli
修改为:
my $megaclibin = '/usr/sbin/MegaCli'; # the full path to your MegaCli binary
my $megacli = "$megaclibin"; # how we actually call MegaCli
说明:/usr/sbin/MegaCli为MegaCli的绝对路径。
c. 移动脚本位置,增加执行权限:
# cp check_megaraid_sas /usr/local/nagios/libexec/check_megaraid_sas
# chmod 755 /usr/local/nagios/libexec/check_megaraid_sas
# /usr/local/nagios/libexec/check_megaraid_sas -h (查看使用帮助)
# /usr/local/nagios/libexec/check_megaraid_sas (检查状态)
OK: 0:0:RAID-10:4 drives:1.089TB:Optimal Drives:4
————————————————————————————————————————————
检测服务端口是否开启:
netstat -anpt | grep nrpe
检测配置信息是否正确:
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 (在主控端 检测则输入被控端 ip 地址)
返回信息为“NRPE v2.12”(版本号),表明NRPE可以和被监控端正常通信
#######################################################################################
主控端配置:
1、定义如何监控远程主机及服务
通过NPRE监控远程Linux主机要使用check_nrpe插件进行,其语法格式如下:
check_nrpe -H <host> [-n] [-u] [-p <port>] [-t <timeout>] [-c <command>] [-a <arglist...>]
示例:
define command
{
command_name check_swap_nrpe
command_line $USER1$check_nrpe -H "$HOSTADDRESS$" -c "check_swap"
}
如果还希望在监控远程LINUX主机时还能向其传递参数,则可以使用类似如下方式进行:
#cd /etc/nagios/objects/
#vi commands.cfg \\增加以下内容
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
2、创建被监控的主机和服务配置文件:
如:vim RHEL.cfg
define host{
use rhel-name
host_name RHEL-97
alias 红帽-97(192.168.34.97)
address 192.168.34.97
}
define service{
use rhel-sys
host_name RHEL-97
service_description disk-磁盘空间
check_command check_nrpe!check_df
}
define service{
use rhel-sys
host_name RHEL-97
service_description 系统负载
check_command check_nrpe!check_load
}
define service{
use rhel-raid
host_name RHEL-97
service_description RAID信息
check_command check_nrpe!check_megaraid_sas
}
define service{
use generic-service
host_name RHEL-97
service_description CHECK USERS
check_command check_nrpe!check_users
}
define service{
use generic-service
host_name RHEL-97
service_description Load
check_command check_nrpe!check_load
}
define service{
use generic-service
host_name RHEL-97
service_description SDA1
check_command check_nrpe!check_sd1
}
define service{
use generic-service
host_name RHEL-97
service_description SDA2
check_command check_nrpe!check_sd2
}
define service{
use generic-service
host_name RHEL-97
service_description Zombie
check_command check_nrpe!check_zombie_procs
}
define service{
use generic-service
host_name RHEL-97
service_description total procs
check_command check_nrpe!check_total_procs
}
3、增加监控脚本
比如CPU、内存、LVS等、需要自己写脚本来做,注意2个点就OK,控制输入(参数等)、格式化输出。只要输出格式符合Nagios的格式识别方式就行!
如:
内存监控:
vi check_mem.sh
#!/bin/bash
# check memory script
# sunny 2008.2.15
# Total memory
TOTAL=`free -m | head -2 |tail -1 |gawk '{print $2}'`
# Free memory
FREE=`free -m | head -2 |tail -1 |gawk '{print $4}'`
# to calculate free percent
# use the expression free * 100 / total
FREETMP=`expr $FREE \* 100 / $TOTAL`
if [ $FREETMP -ge 15 ]
then
echo "OK: The total memory is $TOTAL MB,the free memory is $FREE MB($FREETMP%)"
exit 0
fi
if [ $FREETMP -ge 6 ] || [ $FREETMP -lt 15 ]
then
echo "WARNING: The total memory is $TOTAL MB,the free memory is $FREE MB($FREETMP%)"
exit 1
fi
if [ $FREETMP -le 5 ]
then
echo "ERROR: The total memory is $TOTAL MB,the free memory is $FREE MB($FREETMP%)"
exit 2
fi
LVS监控:
vi check_lvs.sh
MYSQL监控:
在需要监控的mysql数据库上建一个专门给Nagios使用的库
mysql>create database nagdb default CHARSET=utf8;
mysql> grant select on nagdb.* to 'nagios'@'192.168.1.100';
mysql> update mysql.user set Password = PASSWORD('nagios') where user='nagios';
#/usr/local/nagios/libexec/check_mysql -H 192.168.1.101 -u nagios -d nagdb -p nagios -w 10 -c 30
memcached监控:
使用插件,用perl语言写的,需要安装多个依赖包,比较坑爹。。我也不容易啊
(1)安装模块
#yum -y install perl-Carp-Clan perl-Cache-Memcached perl-Nagios-Plugin
--如果不能安装
#wget http://dag.wieers.com/rpm/packages/rpmforge-release/rpmforge-release-0.5.2-2.rf.src.rpm
#rpm -ivh rpmforge-release-0.5.2-2.rf.src.rpm
#yum -y install perl-Nagios-Plugin.noarch perl-Carp-Clan.noarch perl-Cache-Memcached.noarch
--如果perl-Nagios-Plugin无法安装
wget http://packages.sw.be/perl-Nagios-Plugin/perl-Nagios-Plugin-0.33-1.el5.rf.noarch.rpm
rpm -ivh perl-Nagios-Plugin-0.33-1.el5.rf.noarch.rpm --force --nodeps
(2)插件安装
下载Nagios-Plugins-Memcached-0.02.tar.gz后安装【依赖包较多,请注意查看.pm文件的存放位置】
#tar xzvf Nagios-Plugins-Memcached-0.02.tar.gz
#cd Nagios-Plugins-Memcached-0.02
#yum -y install perl-CPAN
# perl Makefile.PL
--执行后会出现一些提示让你选择,按照自己想法选或者一路回车都能通过
# make
--这时他会下载一些运行时需要的东西
# make install
--默认会把check_memcached文件放到/usr/bin/check_memcached
--没关系 把他拷贝到nagios的libexec下
#cp /usr/local/bin/check_memcached /usr/local/nagios/libexec/
#chown nagios.nagios check_memcached
在commands.cfg里面加上这么几条(这里我没有把check_memcached装在memcached服务器上,而是通过Nagios的check_memcached直接去访问memcached服务器的11211端口,当然你也可以把他装在memcached服务器上利用check_nrpe来取他的值)
define command {
command_name check_memcached_11211
command_line $USER1$/check_memcached -H 192.168.1.101:11211 --size-warning 80 --size-critical 90
}
上面这个是来监控memcached的内存使用比例
define command {
command_name memcached_response_11211
command_line /usr/local/bin/check_memcached -H 192.168.1.101 -w 300 -c 500
}
这个是用来监控memcached是否还有应答
define command {
command_name check_memcached_hit
command_line /usr/local/bin/check_memcached -H 192.168.1.101 --hit-warning 10 --hit-critical 5
}
./check_memcached -H 192.168.108.96 -w 300 -c 500
——————————————————————————————————————————
由于在 RHEL.cfg 里添加了 rhel-raid、rhel-sys、rhel-name、rhel-service 等引用资源,所以需在 templates.cfg 中进行定义。
vim /usr/local/nagios/etc/objects/templates.cfg
define host{
name rhel-name ; The name of this host template
use generic-host ; This template inherits other values from the generic-host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 5 ; Actively check the host every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 3 ; Check each Linux host 10 times (max)
check_command check-host-alive ; Default command to check Linux hosts
notification_period workhours ; Linux admins hate to be woken up, so we only notify during the day
; Note that the notification_period variable is being overridden from
; the value that is inherited from the generic-host template!
notification_interval 120 ; Resend notifications every 2 hours
notification_options d,u,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
####
define service{
name rhel-sys ; The 'name' of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0 ; The service is not volatile
check_period 24x7 ; The service can be checked at any time of the day
max_check_attempts 5 ; Re-check the service up to 3 times in order to determine its final (hard) state
normal_check_interval 10 ; Check the service every 10 minutes under normal conditions
retry_check_interval 2 ; Re-check the service every two minutes until a hard state can be determined
contact_groups admins ; Notifications get sent out to everyone in the 'admins' group
notification_options u,c,r ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 60 ; Re-notify about service problems every hour
notification_period 24x7 ; Notifications can be sent out at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
####
define service{
name rhel-raid ; The 'name' of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0 ; The service is not volatile
check_period 24x7 ; The service can be checked at any time of the day
max_check_attempts 5 ; Re-check the service up to 3 times in order to determine its final (hard) state
normal_check_interval 10 ; Check the service every 10 minutes under normal conditions
retry_check_interval 2 ; Re-check the service every two minutes until a hard state can be determined
contact_groups admins ; Notifications get sent out to everyone in the 'admins' group
notification_options w,c ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 0 ; Re-notify about service problems every hour
notification_period 24x7 ; Notifications can be sent out at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
#####
define service{
name rhel-service ; The 'name' of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0 ; The service is not volatile
check_period 24x7 ; The service can be checked at any time of the day
max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state
normal_check_interval 3 ; Check the service every 10 minutes under normal conditions
retry_check_interval 1 ; Re-check the service every two minutes until a hard state can be determined
contact_groups admins ; Notifications get sent out to everyone in the 'admins' group
notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 10 ; Re-notify about service problems every hour
notification_period 24x7 ; Notifications can be sent out at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
#####
______________________________________________________________
添加邮件报警配置:
vim contacts.cfg
将email 后的域名修改为报警邮箱即可
如需添加contactgroup 组用户,则须在contacts.cfg 中相关资源中 contact_groups 后添加定义的组名(组名之间用逗号“,”隔开)
在nagios.cfg中添加定义的监控主机:
vim /nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/RHEL/RHEL.cfg
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
_________________________________________________________
增加mysql监控
(1)下载
#yum install perl-Class-DBI-mysql
http://exchange.nagios.org/components/com_mtree/attachment.php?link_id=174&cf_id=30
http://exchange.nagios.org/components/com_mtree/attachment.php?link_id=174&cf_id=36
(2)
# cp check_mysqld.pl /usr/local/nagios/libexec
# chmod 755 /usr/local/nagios/libexec/check_mysqld.pl
# chown nagios.nagios /usr/local/nagios/libexec/check_mysqld.pl
# cp check_mysqld.php /usr/local/pnp4nagios/share/templates.dist
# chown nagios.nagios /usr/local/pnp4nagios/share/templates.dist/check_mysqld.php
# chmod 755 /usr/local/pnp4nagios/share/templates.dist/check_mysqld.php
(3)
# vi command.cfg
define command{
command_name check_mysqld
command_line $USER1$/check_mysqld.pl -H $HOSTADDRESS$ -u $ARG1$ -p $ARG2$ -D $ARG3$ -a uptime,threads_connected,questions,slow_queries,open_tables -w ',,,,' -c ',,,,' -A $USER21$
}
(4)(——这条选配)
#vi resouce.cfg
$USER7$=nagios
$USER21$='com_select,com_update,com_insert,com_insert_select,com_commit,com_delete,com_rollback,aborted_clients,aborted_connects,binlog_cache_disk_use,binlog_cache_use,bytes_received,bytes_sent,connections,created_tmp_disk_tables,created_tmp_files,created_tmp_tables,delayed_errors,delayed_insert_threads,delayed_writes,handler_update,handler_write,handler_delete,handler_read_first,handler_read_key,handler_read_next,handler_read_prev,handler_read_rnd,handler_read_rnd_next,key_blocks_not_flushed,key_blocks_unused,key_blocks_used,key_read_requests,key_reads,key_write_requests,key_writes,max_used_connections,not_flushed_delayed_rows,open_files,open_streams,open_tables,opened_tables,prepared_stmt_count,qcache_free_blocks,qcache_free_memory,qcache_hits,qcache_inserts,qcache_lowmem_prunes,qcache_not_cached,qcache_queries_in_cache,qcache_total_blocks,questions,select_full_join,select_rangle_check,slow_launch_threads,slow_queries,table_locks_immediate,table_locks_waited,threads_cached,threads_connected,threads_created,threads_running'
(5)
vim templates.cfg
define host{
name mysql-server ; The name of this host template
use generic-host ; This template inherits other values from the generic-host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 5 ; Actively check the host every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 3 ; Check each Linux host 10 times (max)
check_command check-host-alive ; Default command to check Linux hosts
notification_period workhours ; Linux admins hate to be woken up, so we only notify during the day
; Note that the notification_period variable is being overridden from
; the value that is inherited from the generic-host template!
notification_interval 120 ; Resend notifications every 2 hours
notification_options d,u,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
#vi mysql.cfg
define service{
use generic-service,mysql-server
host_name mysql
service_description Mysqld_pnp
check_command check_mysqld!nagios!nagios!nagdb
}
这里贴一个监控配置:
vi mysql.cfg
define host{
use linux-server,mysql-server
host_name mysql
alias My mysql Host
address 192.168.34.101
}
define service{
use generic-service,mysql-server
host_name mysql
service_description Mysqld
check_command check_mysql!nagios!nagios!10!60
}
#define service{
# use generic-service,mysql-server
# host_name mysql
# service_description Mysqld_pnp
# check_command check_mysqld!nagios!nagios!nagdb
#}
define service{
use generic-service,mysql-server
host_name mysql
service_description CHECK USERS
check_command check_nrpe!check_users
}
# Create a service for monitoring the uptime of the server
# Change the host_name to match the name of the host you defined above
define service{
use generic-service,mysql-server
host_name mysql
service_description Load
check_command check_nrpe!check_load
}
# Create a service for monitoring CPU load
# Change the host_name to match the name of the host you defined above
define service{
use generic-service,mysql-server
host_name mysql
service_description SDA1
check_command check_nrpe!check_sd1
}
# Create a service for monitoring memory usage
# Change the host_name to match the name of the host you defined above
define service{
use generic-service,mysql-server
host_name mysql
service_description SDA2
check_command check_nrpe!check_sd2
}
# Create a service for monitoring C:\ disk usage
# Change the host_name to match the name of the host you defined above
define service{
use generic-service,mysql-server
host_name mysql
service_description Zombie
check_command check_nrpe!check_zombie_procs
}
# Create a service for monitoring the W3SVC service
# Change the host_name to match the name of the host you defined above
define service{
use generic-service,mysql-server
host_name mysql
service_description total procs
check_command check_nrpe!check_total_procs
}
define service{
use generic-service,mysql-server
host_name mysql
service_description Cpu
check_command check_nrpe!check_cpu
}
define service{
use generic-service,mysql-server
host_name mysql
service_description Mem
check_command check_nrpe!check_mem
}
#define service{
# use generic-service,mysql-server
# host_name mysql
# service_description Http
# check_command check_http!/
# }
define service{
use generic-service,mysql-server
host_name mysql
service_description Ping
check_command check_ping!100.0,20%!500.0,60%
}
#define service{
# use generic-service,mysql-server
# host_name mysql
# service_description check_memcached_11211
# check_command check_memcached_11211!80!100
# }
#define service{
# use generic-service,mysql-server
# host_name mysql
# service_description check_memcached_response_11211
# check_command check_memcached_response_11211!300!500
# }
#define service{
# use generic-service,mysql-server
# host_name mysql
# service_description check_memcached_hit
# check_command check_memcached_hit!10!5
# }
(至此,所有配置就都完成了)
############################################################
测试:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (检测配置文件)
主控端上测试nagios被监控端命令:
#/usr/local/nagios/libexec/check_nrpe -H 192.168.34.97 -c check_megaraid_sas (或者check_df等监控脚本都行)
正常情况下返回值为监控项的相应信息,则表示正常。
若一切正常则重启服务:service nagios restart
service httpd restart
/usr/local/nagios/bin/nagios -s /usr/local/nagios/etc/nagios.cfg (对nagios进行启动时间评估)
vim nrpe.sh
killall -9 nrpe
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
____________________________________________________________________
nagios 错误 解决方案
错误1】在nagios页面中,有个Map链接,一点开就报错:
The requested URL /nagios/cgi-bin/statusmap.cgi was not found on this server
--解决:
statusmap.cgi依赖gd开发包
通过yum安装gd开发包,然后重新编译configuration及make nagios cgi部分
yum -y install gd gd-devel
./configure --with-gd-lib=/usr/lib --with-gd-inc=/usr/include
#make all
#make install
#make install-init
#make install-config
#make install-commandmode
make install-config
2】普通用户(除nagiosadmin外所有用户)点nagios页面中的service等链接,都出现如下错误:
It appears as though you do not have permission to view information for any of the hosts you requested...
If you believe this is an error, check the HTTP server authentication requirements for accessing this CGI
and check the authorization options in your CGI configuration file.
---原因:
认证用户不正确,编辑etc/cgi.cfg,该文件里默认的是nagiosadmin,如果新建的用户要想查看的话,得添加进去,多用户用逗号分开
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_system_commands=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin
如果不是 nagiosadmin 需要到后面添加,例子 authorized_for_system_information=nagiosadmin,admin
或者是监控主机服务中 host-namg 或者service_description 参数中含有中文字符
3】如果提示“Whoops! Error: Could not read object configuration data! ”,这是因为没有启动nagios后台进程,执行以下命令
解决方法:/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
4】安装nrpe时提示错误:configure: error: cannot find ssl headers
安装nrpe,编译的时候提示以下信息checking for SSL headers... configure: error: Cannot find ssl headers原因是缺少openssl-devel包,
yum -y install openssl-devel 问题解决
5】在 web 端页面,主机或服务显示不了,需多次刷新后才会出现,且点击某项服务是 显示
Error: Service Status Not Found!
解决方法:重启主控端机器(Apache、nagios 等需设置开机启动)
6】当在被控端 nrpe.cfg 文件中定义一条新的命令时,
如:command[check_df]=/usr/local/nagios/libexec/check_disk -x / -w 15% -c 5%
在用 /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_df 检测时报错:
显示 “DISK CRITICAL - /root/.gvfs is not accessible: 权限不够”,且在等于号后面的
命令本身正确的情况下,可以在参数后加 【-A -i '.gvfs'】,改变后的命令为:
command[check_df]=/usr/local/nagios/libexec/check_disk -x / -w 15% -c 5% -A -i '.gvfs'
再重启 nrpe 可解决问题。
nagios 服务端与客户端监控安装与详细配置,各配置文件详解
原文:http://blog.51cto.com/13017250/2059102