Sitworld: Situations Caused Domain Name Server Overload

spider1

John Alvord, IBM Corporation

jalvord@us.ibm.com

Inspiration

A customer reported an intense workload to their Domain Name Server [DNS] flowing from systems running Windows/Linux/Unix OS Agents. The workload was heavy and caused response time delays on normal business processes.

Diagnosis

There were 3500+ situations and 1500+ had email action commands. Some 230 of the action commands were set up to send the email on every interval, not just the first instance. Every email command would require 2-4 DNS lookups.

The particular condition was the result of a single Windows system which had a situations running which evaluated True. Several of the situations had action commands set to run on every interval. The situations had a DisplayItem configured and 8-10 emails were sent each interval. The result was about 60 emails a minute steady state. That was enough to disrupt the DNS server performance.

Identification and recovery

The “every interval” situations were identified using the following technique. The general procedure for creating a TEP workspace viewing TEMS objects is documented here.

Review that first and slot in the changes.

The Custom SQL used for this case is:

SELECT

SITNAME,

AUTOSOPT,

CMD

FROM O4SRV.TSITDESC’;

In this case we need to add a query filter

filter_email

The last check – tests for presence of “mail” in the Command attribute.  The Command title doesn’t show in this capture but if you slide the scroll bar it can be seen.

In the properties select “return all rows” option.

The end result will be a list of all situations which had a Y in the second character of the AUTOSOPT attribute – which means perform action command every interval. The filter also is limited to the action commands which had “mail” in the command line.

The resulting table was exported to a csv file [right click in table and select export…]. Each situation was evaluated and for most the action command was changed to “Don’t take action twice in a row” – the default.

Results

With this change the number of emails was sharply reduced and the immediate crisis was over.

For a long term solution I suggested they stop using emails to transmit events. Email has a lot more overhead then an event receiver like Netcool/Omnibus. Email cannot transmit the fact that a condition has been resolved. Email is not manageable.

Summary

This documented how a high volume of emails from situations caused a DNS server overload and how the problem cases were identified and resolved.

Sitworld: Table of Contents

Note: Enterprising Spider after a Foggy Night

 

One thought on “Sitworld: Situations Caused Domain Name Server Overload

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: