Sitworld: AOA Critical Issue – High Virtual Hub Table Updates

lemons

Version 1.00000 –  8 October 2018

John Alvord, IBM Corporation

jalvord@us.ibm.com

Follow on twitter

Inspiration

In August 2014, the Database Health Checker began running at IBM ECUREP as an Analysis On Arrival task on each incoming hub and remote TEMS pdcollect. Since then TEMS Audit and Event History Audit reports have been added. The reports are very useful for by identifying known error condition and thus speeding ITM diagnosis of issues. Each of the tools can be run by any customer, but the AOA reports are not immediately visible. Any customer could ask for them but not being visible no one ever asks.  At the same time the reports have become more complex and challenging to digest.

With a recent change, the process has been extended to create a short list of critical issues which will automatically be added to the S/F Case or PMR as a short email text. That creates visibility for critical issues. This document presents one specific critical issue – High Virtual Hub Table Updates.

Please note that the conditions identified may not be the issue the problem case was opened for. For example one recent case was a FTO hub TEMS switch to backup that was unexpected. After close study, the major issues was mal-configured agents including duplicate name cases, Virtual Hub Table Update floods and several other items. There are also rare cases where a report will be produced concerning an obsolete TEMS that is definitely installed but not in action use. In that case the report could be ignored – although uninstalling the TEMS would be a good idea.

Getting more information

If you are viewing this document as an customer working with IBM Support, you are welcome to request copies of the Analysis On Arrival reports if they are available. Be sure to mention the unpack directory from the AOA Critical Issue report.

TEMS Audit – temsaud.csv [any hub or remote TEMS]

Database Health Checker – datahealth.csv [any hub TEMS]

Event History Audit – eventaud.csv [any hub or remote TEMS]

There are cases when no report is generated. Sometimes that means there were no advisories. TEMS Audit is not produced when the relevant log files cannot be identified. Database Health checker is run but skipped if it appears to be a remote TEMS. Event History Audit and Database Health Checker are not run if there are errors detected in the table extract process.

Visit the links above to access the AOA programs if you want to run the AOA programs at your own schedule.

Virtual Hub Table Updates

datahealth.crit: Virtual Hub Table updates peak $peak_rate per second more then nominal $opt_peak_rate –  per hour [$vtnode_tot_hr] – total agents $vtnode_tot_ct – See DATAREPORT020

This is a relatively common condition where certain agents stress the remote and hub TEMS by sending updates to hub TEMS in-storage tables which are not used for anything useful. This critical level is recorded  if the incoming work is more than 32 arriving in each peak second [every 1 or 2 or 3 minutes]. At the 32 level that only consumed 32 of the 64 ITM communication pipes. For context the peak second has been calculated at more 3000 and the higher the rate the worse the problem. The issue and background is documented here

Sitworld: ITM Virtual Table Termite Control Project

https://www.ibm.com/developerworks/community/blogs/jalvord/entry/sitworld_itm_virtual_table_termite_control_project?lang=en

Only a relatively small number of agents are involved:

HV – Monitoring Agent for Microsoft Hyper-V Server

OQ – Monitoring Agent for Microsoft SQL Server

OR – Monitoring Agent for Oracle

OY – Monitoring Agent for Sybase Server

Q5 – Monitoring Agent for Microsoft Cluster Server

One agent distribution was altered in THe ITM 623 GA time frame to avoid the issue:

UX – Monitoring Agent for UNIX OS

Traditionally IBM Support creates the recovery action plan and needed files, however you are welcome to use the above tool.

Example Recovery Action plan – Unix/Linux Style

This environment is being hit very hard with virtual hub table updates.

This ITM area is not that well known and I documented it here

Sitworld: ITM Virtual Table Termite Control Project

https://www.ibm.com/developerworks/community/blogs/jalvord/entry/sitworld_itm_virtual_table_termite_control_project?lang=en

In your case every 2 minutes there is a burst of 2074 incoming updates

from the 513 OQ [Agent for Microsoft MS-SQL] agents and the 11 OY [Agent for Sybase] agents.

These update in-storage tables which are not, in fact, used for anything.

The data volume is not that high, but the sudden bursts occurring at the

same second can cause delays and time outs in communication. At the very

latest levels, ITM communications is limited to only 64 at a time.

This recovery action plan eliminates the objects and then recycles all

the affected agents. This might have to be repeated if some remote TEMS

agents are offline. The files are available for access here

The following are then usual files needed to implement the recovery plan.

You can create them yourself or IBM Support can create the files from a

hub TEMS pdcollect.

delete.sql

recycle.sh

recycle.cmd

show.sql

1) Copy the recycle.sh file to the hub TEMS /opt/IBM/ITM/bin.

2) Login to the system running the TEPS and copy the delete.sql file

to /tmp and then use it to update the TEMS database files. The following

assumes you use the same install directory as the hub TEMS… otherwise

use that bin directory.

cd /opt/IBM/ITM/bin

./itmcmd execute cq “KfwSQLClient /v /f /tmp/delete.sql”

some people like to do a show.sql before and after…

This will delete all the potentially problem objects from all the

TEMSes… including some objects that might not be installed. This can

be done anytime and has no immediate effect on any agents running.

There are two ways to complete the work. Each work well and you only

need to do one.

3) Recycle all the agents involved:

On the hub TEMS

cd /opt/IBM/ITM/bin

./tacmd login -s …. [to hub TEMS]

sh recycle.sh

This will recycle all the OQ and OY involved. They will get new

instructions NOT including the problem/unuseful ones. This process will,

of course, recycle all the 513 Agents for Microsoft MS-SQL and the 11

Agent for Sybase agents. So that might need some scheduling…

4) As an alternative, you can recycle all TEMSes – hub and remote TEMSes.

Either (3) or (4) will work just fine – you only need to do one. 

Summary

This documents the High Virtual Hub Table Updates condition and how to cure it.

History

1.00000

Initial release

Note: 2018 – Home Grown Meyer Lemons

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: