How to prevent EMS monitor requests for a device

From Wiki-UX.info
Jump to: navigation, search

Abstract

Issue Description

EMS/STM is a set of utilities and monitors available to handle hardware failure events on the system.

When a hardware failure occurs, the system will send EMS messages to syslog file and emails to the system administrator if configured. Some situations may require to program the hardware replacement for a few days later, but the EMS messages will continue to appear.

Disabling the monitor will prevent it from monitoring some other devices of the same class, for example disk devices.

Resolution

The EMS version of September 2000 added functionality to allow the user to disable monitoring of a particular instance.

The startmon_client binary now checks a list of instances in the file /var/stm/data/tools/monitor/disabled_instances and does not create monitoring requests from the *.sapcfg files for those instances.

For example:

1. Customer receives the following error:

>------------ Event Monitoring Service Event Notification ------------<

Notification Time: Thu Aug  7 10:01:50 2008

eiger-p1 sent Event Monitor notification information:

/storage/events/disks/default/1_0_4_1_0.1.0 is >= 1.
Its current value is MAJORWARNING(3).



Event data from monitor:

Event Time..........: Thu Aug  7 10:01:50 2008
Severity............: MAJORWARNING
Monitor.............: disk_em
Event #.............: 100091
System..............: eiger-p1.rose.hp.com
...

2. Gather the event monitor name from the EMS /var/opt/resmon/log/event.log file, or EMS e-mail message. If you don't know the name, run the following command to determine the names of instances which should be disabled.:

# /etc/opt/resmon/lbin/moncheck

3. Edit the /var/stm/data/tools/monitor/disabled_instances file adding the device you want to prevent from being monitored:

/storage/events/disks/default/1_0_4_1_0.1.0

or

/storage/events/disks/default/*

4. Re-enable the monitor. Startup the monconfig interface:

# /etc/opt/resmon/lbin/monconfig

5. Select the E)nable Monitoring option. EMS will continue to poll the device, but no EMS messages will be generated anymore.

6. Run /etc/opt/resmon/lbin/moncheck again to verify the changes.

  • Note: There is no need to use the option "K)ill monitoring" in monconfig. If MC/ServiceGuard is running and packages with PSM (Peripheral Status Monitor) dependencies for the hardware resources are configured on this system, killing/disabling monitoring on this system may cause the package to fail over to another node.

Reference

Authors