Introduction to Nagios for novice users who need to monitor sensor outputs (such as temperature/humidity in a server room) but have only limited experience with Nagios. Horizontal menuSupportNagios [http://www.nagios.org/] is the most popular open-source monitoring tool. Using plugins, it can monitor any devices (called “hosts”) and services using any protocol. The most common scenarios involve monitoring devices over SNMP or monitoring servers using NRPE (Nagios Remote Plugin Executor). Nagios is configured with text configuration files and controlled over a web interface. Nagios works with 3 primary objects (object): service - services to be monitored (e.g. CPU load, toner level in a printer, temperature in the monitored room) host - the devices (hosts) where the monitored services are running (e.g. servers, printers, thermometers, ...) contact - people who receive status notifications about services and hosts (admins, technicians, operators, ...) These objects can be grouped (group) in order to simplify configuration and get a clearer overview of related objects in the web interface. There are more objects. The timeperiod objects define when to monitor hosts/services and when to notify contact persons. The actual plugins are defined using the command objects. Further objects for escalating issues and specifying dependencies exceed the scope of this article (for more details, see the Nagios Core Documentation, chapter Object Configuration Overview, available at http://nagios.sourceforge.net/docs/nagios-3.pdf). Installing Nagios This description is intended for users with only a basic knowledge of the GNU/Linux operating system. Therefore, installation from the source code is not covered here (if interested, the source package is available at http://www.nagios.org/download/core/, and is installed with the usual ./configure, make all, make install triad). For novice users, we recommend the Ubuntu Linux distribution [http://www.ubuntu.com/]. The examples in this text assume a standard installation of Ubuntu Server Edition 9.10 [http://www.ubuntu.com/getubuntu/download-server]. Install Nagios using the following command: helpdesk@monitoring:~$ sudo aptitude install nagios3 The installer automatically selects additional packages that are required by Nagios (Apache web server, SNMP libraries, mail server, and so on). Confirm the installation of these packages. After installation, you will be probably asked to set up Postfix (mailserver). Select Internet Site and enter the server name and domain (fully qualified domain name, such as monitoring.company.com). Then, enter a password for accessing Nagios over the web. After installation, you can use your web browser to verify that Nagios is running. Open http://192.168.1.1/nagios3/, where 192.168.1.1 is the IP address of the server where Nagios has been installed. The login name is nagiosadmin, the password is the one you specified during installation. If you forgot your password, enter the sudo htpasswd /etc/nagios3/htpasswd.users nagiosadmin command and set a new password. Tip: You can use the sudo ifconfig command to find out your IP address: eth0 Link encap:Ethernet HWaddr 08:00:27:3d:d9:f1 inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe3d:d9f1/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:4300 errors:0 dropped:0 overruns:0 frame:0 TX packets:2946 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:444119 (444.1 KB) TX bytes:1304450 (1.3 MB) Interrupt:11 Base address:0xc020 If you installed the server with IP address configured over DHCP, and you now need to specify the IP address by hand, modify the /etc/network/interfaces file: auto eth0 iface eth0 inet static address 192.168.1.1 netmask 255.255.255.0 gateway 192.168.1.254 Similarly, to set up automatic configuration over DHCP, modify the file as follows: auto eth0 iface eth0 inet dhcp To activate changes in the network settings, enter the /etc/init.d/networking restart command. For easier orientation, we recommend to install Midnight Commander: helpdesk@monitofanring:~$ sudo aptitude install mc To start Midnight Commander, enter the mc command. By default, the Ubuntu distribution does not have the internal editor selected. To activate it, press F9 → Options → Configuration... → check use internal edIt → Save. To edit a file, press F4. The ESC key works as a command prefix. To end the edit mode, press ESC twice (this behavior comes from the “dumb terminal” support; when a terminal does not support e.g. function keys, the F4 key can be replaced by pressing ESC and 4 in sequence). Press SHIFT+F4 to create a new file. To avoid automatic indenting when pasting a file from the clipboard, press F9 → Options → General... → uncheck Return does autoindent → OK. Install SSH to allow remote access to the server's terminal: helpdesk@monitoring:~$ sudo aptitude install openssh-server Remember that by installing SSH, you enable remote access to the server; therefore, you should use sufficiently strong passwords. To access the server from Windows computers, we recommend the PuTTY program [http://www.chiark.greenend.org.uk/~sgtatham/putty/]. Tip: In Unix-like systems, operations such as installing and running system utilities, changing system settings, etc. can only be performed by the super-user (the root user account). To avoid typing sudo in front of every command when installing or configuring Nagios, you can switch to the super-user mode by entering sudo su -. helpdesk@monitoring:~$ whoami helpdesk (logged in as the helpdesk user) helpdesk@monitoring:~$ sudo su - [sudo] password for helpdesk: enter the password of the user that is currently logged in root@monitoring:~# whoami root (now we're logged in as root, commands will be executed with root privileges) root@monitoring:~# exit logout helpdesk@monitoring:~$ Notice that the super-user prompt ends with the pound (#) sign, while a regular user prompt ends with the dollar ($) sign. Configuring Nagios Nagios configuration files are located in the /etc/nagios3 directory. The monitored infrastructure is defined using files in the /etc/nagios3/conf.d directory. To get acquainted with Nagios configuration, let us back up the preconfigured infrastructure settings and create our own configuration. helpdesk@monitoring:~$ sudo su - [sudo] password for helpdesk: root@monitoring:~# mkdir /root/nagios_backup root@monitoring:~# mv /etc/nagios3/conf.d/* /root/nagios_backup Time periods - timeperiod Time periods, during which the monitoring is performed and contact people are notified, are specified in the /etc/nagios3/conf.d/timeperiods.cfg file. Every object is defined with define timeperiod { … }. The timeperiod_name parameter specifies a name that is used to refer to this time period in the hosts, services and contacts configuration. The alias parameter specifies a display name for this time period. Then follows a list of time intervals that specify when is the given time period active. Multiple time intervals can be specified for each day. (For instance, to specify Mondays outside of office hours, enter: monday 00:00-8:00,18:00-24:00). define timeperiod { timeperiod_name 24x7 alias Nonstop 24x7 monday 00:00-24:00 tuesday 00:00-24:00 wednesday 00:00-24:00 thursday 00:00-24:00 friday 00:00-24:00 sunday 00:00-24:00 saturday 00:00-24:00 } define timeperiod { timeperiod_name 10x5 alias Work hours 10x5 monday 08:00-18:00 tuesday 08:00-18:00 wednesday 08:00-18:00 thursday 08:00-18:00 friday 08:00-18:00 } define timeperiod { timeperiod_name never alias Never } This example defines three time periods. The first time period, 24x7, is for non-stop monitoring. The second time period, 10x5, is active on workdays from 8:00 to 18:00. The third time period, never, is never active (use it if you do not want to send any notifications). Contact persons - contact People to be notified are configured in the /etc/nagios3/conf.d/contacts.cfg file. The host_ and service_ notification_period parameters specify when should the contact receive notifications. The host_notification_options parameter specifies what types of notifications shall be sent to that contact: The service_notification_options parameter specifies what types of messages shall be sent to this contact: The host_ and service_ notification_commands specify the command to be run in order to send a notification of a host or service event. The email and address1 parameters are passed as arguments to the commands. define contact { contact_name helpdesk alias Company Helpdesk host_notification_period 24x7 service_notification_period 24x7 host_notification_options d,u,r service_notification_options u,w,c,r service_notification_commands notify-service-by-email host_notification_commands notify-host-by-email email <span class="spamspan"><span class="u">helpdesk</span> [at] <span class="d">company<span class="t"> [dot] </span>com</span></span> } define contact { contact_name technician1_mail alias John Doe host_notification_period 10x5 service_notification_period 10x5 host_notification_options n service_notification_options w,c,r service_notification_commands notify-service-by-email host_notification_commands notify-host-by-email email <span class="spamspan"><span class="u">john<span class="t"> [dot] </span>doe</span> [at] <span class="d">company<span class="t"> [dot] </span>com</span></span> } This defines three contacts. The first contact definition, helpdesk, e-mails state change notifications for hosts and services to helpdesk [at] company [dot] com at all times. The second contact definition, technician1_mail, e-mails state change notifications for services during work hours (see timeperiod 10x5) to john [dot] doe [at] company [dot] com. Contact groups - contactgroup To avoid listing all contact persons for every host and service, you can define contact groups in the /etc/nagios3/conf.d/contactgroups.cfg file. define contactgroup { contactgroup_name support alias Company Support members helpdesk, technician1_mail } Notifications for hosts and services with the support contact group specified will be sent to helpdesk and technician1_mail. Hosts - host In order to monitor a service or a sensor reading, we need to define the host (device) where it is available. Usually, the host is the IP address of the device. Before adding individual hosts to Nagios, let us prepare a template with common monitoring parameters to avoid repetitive typing. The name is used to refer to this particular host (or template, in this case) in other parts of the configuration. The notification_interval parameter defines how often are notifications sent if a host remains unavailable. The notification_period and check_period parameters specify the timing of notifications or availability checks. The normal_check_interval parameter defines how frequently to check a host. If the host becomes unavailable, it is checked every retry_check_interval minutes and up to max_check_attempts times. The method of checking host availability is defined by the check_command parameter. The contact_groups parameter defines the contact groups to notify. The register 0 parameter indicates that this is a template and not an actual host. Store the template to the /etc/nagios3/conf.d/tmplates.cfg file. define host { name standard-host notifications_enabled 1 notification_interval 0 notification_period 24x7 notification_options d,u,r check_period 24x7 normal_check_interval 5 retry_check_interval 1 max_check_attempts 10 first_notification_delay 10 check_command check-host-alive contact_groups support register 0 } This defines the standard-host template. Unavailable host notifications are sent only once. Notifications and checks are active at all times. Host availability is checked every 5 minutes. If a host becomes unavailable, its availability is checked every minute for 10 minutes, and then it is again checked every 5 minutes. If the host becomes available again within 10 minutes after the failure (first_notification_delay parameter), unavailability notification is not sent. The host is checked using the check-host-alive command, that is, using the ICMP protocol (ping). Notifications are sent to the support group. Now we can define the hosts that we want to be monitored. To begin, let us specify the Nagios server itself. The use parameter specifies which template to use for common host parameters. Then, specify a name for the host using the host_name parameter. The name is used to refer to this host in the rest of this configuration. The address parameter defines the IP address for accessing the host. The host icon is defined with the icon_image parameter (icons are located in the /usr/share/nagios/htdocs/images/logos/ directory). Store the configuration to the /etc/nagios3/conf.d/localhost.cfg file. define host { use standard-host host_name localhost alias Nagios server address 127.0.0.1 icon_image base/linux40.png } The Nagios server uses the parameters in the standard-host template. Services running on this server will refer to the localhost name. The server is available at the loopback interface (IP 127.0.0.1). The Linux logo is used as the icon. Services - service Now we need to define individual services or values to be monitored at the host (device). Services have similar parameters to hosts (host). Therefore, we set up a template first to avoid typing the same general parameters for every service. The service template should be added to the /etc/nagios3/conf.d/tmplates.cfg file. define service { name standard-service notifications_enabled 1 notification_interval 0 notification_period 24x7 notification_options u,w,c,r check_period 24x7 normal_check_interval 5 retry_check_interval 1 max_check_attempts 4 first_notification_delay 5 contact_groups support register 0 } The only difference between the host template and the service template is the shorter interval for testing a failing service. Again, remember to add the register 0 parameter to avoid registering the template as a regular service. Now we can add services that should be monitored at the server. Again, specify the use parameter to use a template with common parameters. Every service is tied to a host using the host_name parameter. The key parameter is the check_command that defines the plugin for monitoring the service and its arguments. Individual arguments are separated with the exclamation mark (“!”). Add the following two services to the /etc/nagios3/conf.d/tmplates.cfg file: define service { use standard-service host_name localhost service_description Disk free check_command check_all_disks!10%!5% } define service { use standard-service host_name localhost service_description System load check_command check_load!5.0!4.0!3.0!10.0!6.0!4.0 } These definitions monitor free disk space (Disk free) and server load (System Load). Free disk space is monitored with the check_all_disks command that takes two arguments. When the disk is 90 % full (10 % free space), a warning event is issued. Upon reaching 95 % of the disk capacity, a critical event is issued. These events are sent to the members of the support group (specified in the standard-service template that is included by the use parameter). Activating configuration changes in Nagios Nagios reads its configuration files upon startup. To activate the changes, Nagios needs to be reloaded with the following command: root@monitoring:~# /etc/init.d/nagios3 reload If there is an error in the configuration files, you will receive a warning: Reading configuration data... Error: Invalid host object directive 'registe'. Error: Could not add object property in file '/etc/nagios3/conf.d/templates.cfg' on line 14. ***> One or more problems was encountered while processing the config files... In this case, an incorrect “registe” parameter instead of the correct “register” was specified at line 14 in the /etc/nagios3/conf.d/templates.cfg file. After fixing the error, reload Nagios again. If the configuration is correct, the following message is displayed: * Reloading nagios3 monitoring daemon configuration /files nagios3 [ OK ] Nagios Web Interface The main monitoring page is Tactical Overview. Click the number of hosts or services in a given state to display a page with hosts or services in that state. Click Host Detail to get an overview of the monitored hosts (devices). Colors indicate the states of individual hosts. Click a host name to get detailed information about this host, including an option to suspend monitoring, set a planned outage, and so on. Click the semaphore icon to get an overview of the services at a host. In the sample configuration, Nagios monitors one host. The host was last checked on 2nd November 2009 at 6:22 pm. The host has been “UP” for 1 day and 18 hours. The test result is OK, packet loss 0 %, latency 120μs. The list of services is similar to the list of hosts. It is available under Service Detail. Services are grouped by hosts on which they run. Monitoring your own SNMP services The list of plugins (command) that enable Nagios to monitor services is available in the web interface under View Config, or directly in the files in the /etc/nagios-plugins/config/ directory. If a plugin is not available, a new command to monitor the service needs to be created. The definition of a plugin contains a command_name that will be used to refer to that plugin, and a command (command_line) that checks the state of the service. Toner level As an example, we can monitor the toner level in a printer over SNMP. OID with the toner level is .1.3.6.1.2.1.43.11.1.1.9.1.1. To avoid defining a separate plugin for each printer, the IP address, SNMP community, warning and critical parameters are passed in variables from the configuration of individual services. Create a /etc/nagios3/conf.d/tmplates.cfg file. define host { use standard-host host_name helpdesk_printer alias Helpdesk printer address 192.168.12.200 icon_image base/hp-printer40.png } define service { use standard-service service_description Toner level host helpdesk_printer check_command printer_toner!public!10%!5% } The plugin for monitoring the toner level is named printer_toner. Standard Nagios plugins include a program for retrieving values over SNMP. When the plugin is activated to check the current state, $HOSTADDRESS$ is replaced with the actual address indicated in the definition of the printer host. Other variables ($ARG#$) are initialized according to the check_command in the definition of the particular service. The printer and toner level monitoring is configured in the /etc/nagios3/conf.d/helpdesk_printer.cfg file. define host { use standard-host host_name helpdesk_printer alias Helpdesk printer address 192.168.12.200 icon_image base/hp-printer40.png } define service { use standard-service service_description Toner level host helpdesk_printer check_command printer_toner!public!10%!5% } The plugin to use is specified in the first part of the check_command parameter in the service definition. The $HOSTADDRESS$ variable will contain 192.168.12.200, as specified by the address parameter in the host definition. The $ARG#$ variables contain the values specified in the check_command parameter. Individual values are separated with the exclamation mark (“!”), therefore, $ARG1$ = public, $ARG2$ = 10%, $ARG3$ = 5%. When checking this particular service, Nagios executes the following command: /usr/lib/nagios/plugins/check_snmp -H '192.168.12.200' -C 'public' -o .1.3.6.1.2.1.43.11.1.1.9.1.1 -w '10%': -c '5%': -l 'Toner level' -u '%' You can run this command in a terminal to verify correct settings. Remember to reload Nagios configuration using the following command: /etc/init.d/nagios3 reload Monitoring sensor readings Monitoring of sensor readings is similar to monitoring services at a host. In this example, let us add devices that support Nagios monitoring of their sensor readings. Download the plugin (command) for monitoring Poseidon (or HWg-STE / Damocles) devices from the HW group website. Follow the instructions to unpack the downloaded files and place the .pl files to the /opt/hwg/ directory, place the directory with images to /usr/share/nagios3/htdocs/images/logos/, and place the hwg.cfg file to /etc/nagios-plugins/config/. Now, let us find the IDs of sensors that we want to monitor. Poseidon For Poseidon, look at the http://poseidon.hwg.cz/values.xml address: Create the /etc/nagios3/conf.d/poseidon_demo.cfg file using these values. define host { use standard-host host_name poseidon.hwg.cz alias Poseidon demo address poseidon.hwg.cz icon_image hwg/poseidon40.png } define service { use standard-service service_description Office temp. host poseidon.hwg.cz check_command check_hwg_poseidon!public!20408 } define service { use standard-service service_description Office humidity host poseidon.hwg.cz check_command check_hwg_poseidon!public!57356 } define service { use standard-service service_description Prague temp. host poseidon.hwg.cz check_command check_hwg_poseidon!public!66 } define service { use standard-service service_description Prague humidity host poseidon.hwg.cz check_command check_hwg_poseidon!public!78 } Notice that the warning and critical values are not specified in the configuration. They are loaded automatically from the device (host). It is important to set the host (device) address and the sensor IDs in the check_command parameters of the respective services. HWg-STE (SNMP Web Thermometer) In a similar way, we can find out sensor IDs of a STE device at http://ste.hwg.cz/: Create the /etc/nagios3/conf.d/ste_demo.cfg file using these values. define host { use standard-host host_name ste.hwg.cz alias STE demo address ste.hwg.cz icon_image hwg/ste40.png } define service { use standard-service service_description Office temp. host ste.hwg.cz check_command check_hwg_ste!public!215 } define service { use standard-service service_description Office humidity host ste.hwg.cz check_command check_hwg_ste!public!216 } Remember to reload Nagios configuration using the following command: /etc/init.d/nagios3 reload DevicesDevicesStandalone Monitoring STE2 PLUS STE2 LITE Damocles2 1208 Damocles2 MINI Damocles2 2404 STE2 R2 Poseidon2 3468 HWg-STE plus HWg-STE SoftwareSoftwareStandalone Monitoring Nagios