Sitworld: Running TEMS without SITMON

Bedroomunderconstruction

John Alvord, IBM Corporation

jalvord@us.ibm.com

Follow on twitter

Introduction

A customer hub TEMS was crashing on startup. This is a painful moment because that means that monitoring is momentarily impossible.

A review of the logs showed that one particular situation was starting, there were a lot of action commands recorded in the hub TEMS operations log and then the failure.

Background

At startup the TEMS does a lot of things including starting situations. If a situation causes an immediate failure then it needs to be prevented from starting up.This particular situation was MS_Offline type. The product provided situation formula looks like this

(  Status == *OFFLINE AND Reason != ‘FA’)

The meaning is to check the node status table rows for offline status. The event is not true during startup or other periods when offline is not meaningful – when the Reason attribute is FA [Framework Architecture] is set.

In this case the customer created situation formula was

(  Status == *OFFLINE AND PRODUCT == UX )

The added test was for a particular type of agent – UX for Unix OS Agent. They had almost 2000 Unix OS Agents. The FA test was missing.

In addition the customer created an action command to send an email when that condition was true. That was the root cause of the failure.

See this post Sitworld: MS_Offline: Myth and Reality for a more in depth discussion on MS_Offline situations.

The new project Sitworld: ITM Situation Audit will detect this case, missing Reason NE FA test in a static analysis of all situations.

Functioning of TEMS failure

At TEMS startup the user MS_Offline type situation was started when SITMON began working. Most agents were not connected and were recorded as offline. The result was about 2000 situation events occurred during startup. Each event triggered running an action command to send email. All these line mode commands ran in the same process space as the TEMS – at the “same time”.

This exceeded the process size limits and the TEMS failed. In other cases the system paging space was exceeded which also triggered a different type of failure but even worse since all processes on the system were affected.

Recovery Action Part 1 – run the TEMS without SITMON

The file changed is KBBENV which is in this directory

Linux/Unix: <installdir>/tables/temsname/

Windows: <installdir>\cms\

z/OS: RKANPARU – KDSENV member

1) Stop the TEMS – if not already stopped

2) Copy the KBBENV file to KBBENVS

3) Edit the KBBENV file

4) Locate the KDS_RUN= line and remove the “KSMOMS.KSMOMS;” entry.

5) Add the following to the end of KBBENV – CMS_SKIP_SITMON=Y

6) Start the hub TEMS

Recovery Action Part 2 Change the problem situation to not run at startup

In most cases you will be working with IBM Support to determine the problem situations.

In some cases you may be able to start TEMS and the TEPS and make a Portal Client session. If so you can just delete the situation or at least change it so Run at Startup is clicked off. If that works continue to Recovery Action Part 3.

The general instructions look like this for Linux/Unix

1) Login to system running the hub TEMS

2) cd <installdir>/tmp

3) Make a new directory

 mkdir sqllib

4) In the new sqllib directory create a file runoff.sql with the following contents

UPDATE O4SRV.TSITDESC

SET AUTOSTART = ‘*NO’

WHERE SITNAME = ‘<sitname>’;

5) In the <installdir>/tmp  directory create a new file runoff.txt  with these contents

ip.pipe

a.b.c.d

1918

runoff.sql

end

You would need to tailor this with the correct protocol, ip address , port and sql file name depending on your environment

6) That is the end of the preparation.

To run this first locate the TEMS config file

7) ls <installdir>/config/*.config

In the test case it was  /opt/IBM/ITM/config/ nmp182_ms_HUB_NMP182.config

9) source include that file. In my test case that is

.   /opt/IBM/ITM/config/nmp182_ms_HUB_NMP182.config

that is a period, then a blank and then the TEMS config file. This prepares the environment to run kdstsns

10) run the SQL doing these two commands

export SQLLIB=sqllib

kdstns <runoff.txt

For Windows the instructions are about the same. You do the work in a newly created <installdir>/tmp. kdstsns runs more simply with a

..\cms\kdstsns

Recovery Action 3 – Restore normal Activity

Stop the TEMS, restore the saved KBBENV and start the TEMS. Then it should be running normally. At this point the problem situation needs to be rethought.

Summary

This shows how to make manual configuration changes so that TEMS will temporarily start up without SITMON.

Sitworld: Table of Contents

Photo Note: New Bedroom Under Construction in Carmel Highlands – February 2013

 

3 thoughts on “Sitworld: Running TEMS without SITMON

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: