How to prevent EMS monitor requests for a device
Abstract
Issue Description
EMS/STM is a set of utilities and monitors available to handle hardware failure events on the system.
When a hardware failure occurs, the system will send EMS messages to syslog file and emails to the system administrator if configured. Some situations may require to program the hardware replacement for a few days later, but the EMS messages will continue to appear.
Disabling the monitor will prevent it from monitoring some other devices of the same class, for example disk devices.
Resolution
The EMS version of September 2000 added functionality to allow the user to disable monitoring of a particular instance.
The startmon_client binary now checks a list of instances in the file /var/stm/data/tools/monitor/disabled_instances and does not create monitoring requests from the *.sapcfg files for those instances.
For example:
1. Customer receives the following error:
>------------ Event Monitoring Service Event Notification ------------< Notification Time: Thu Aug 7 10:01:50 2008 eiger-p1 sent Event Monitor notification information: /storage/events/disks/default/1_0_4_1_0.1.0 is >= 1. Its current value is MAJORWARNING(3). Event data from monitor: Event Time..........: Thu Aug 7 10:01:50 2008 Severity............: MAJORWARNING Monitor.............: disk_em Event #.............: 100091 System..............: eiger-p1.rose.hp.com ...
2. Gather the event monitor name from the EMS /var/opt/resmon/log/event.log file, or EMS e-mail message. If you don't know the name, run the following command to determine the names of instances which should be disabled.:
# /etc/opt/resmon/lbin/moncheck
3. Edit the /var/stm/data/tools/monitor/disabled_instances file adding the device you want to prevent from being monitored:
/storage/events/disks/default/1_0_4_1_0.1.0
or
/storage/events/disks/default/*
4. Re-enable the monitor. Startup the monconfig interface:
# /etc/opt/resmon/lbin/monconfig
5. Select the E)nable Monitoring option. EMS will continue to poll the device, but no EMS messages will be generated anymore.
6. Run /etc/opt/resmon/lbin/moncheck again to verify the changes.
- Note: There is no need to use the option "K)ill monitoring" in monconfig. If MC/ServiceGuard is running and packages with PSM (Peripheral Status Monitor) dependencies for the hardware resources are configured on this system, killing/disabling monitoring on this system may cause the package to fail over to another node.
Reference
- SAW: EMS: How to prevent (disable) EMS monitor requests for a device
- SAW: emr_na-c01027104 EMS dm_stape polling errors while running a backup application