Sitworld: Attribute and Catalog Healh Survey

  sunset-20000115

John Alvord, IBM Corporation

jalvord@us.ibm.com

Draft #5 – 12 February 2016 – Level 0.83000

Follow on twitter

Inspiration

Recently I worked with a customer that experienced into a rarely seen ITM limit. ITM uses catalog and attribute files to define the data that agents can process from their monitored environments. The TEMS reads the catalog files into a combined catalog table and the attribute files into an in storage attribute collection. These get used in Situations, Historical Data, Real Time data displays and more. This customer had added the 513th catalog file and TEMS failed during startup. Internally .cat files are known as package files and there is an absolute limit of 512 packages. With IBM Support help, the customer removed a few .cat and .atr files, reset the combined catalog file to empty and the TEMS started up just fine.

However this meant the customer was unable to install certain types of maintenance or new applications. There was an urgent need for a reliable way to identify unused catalog and attribute files.

The result is this package which calculates the unused catalog and attribute files. It also produces a health report which tells error cases like an attribute group used in a situation which is missing from any attribute files.

Data Sources

The Attribute files are taken from the hub TEMS environment:

Linux/Unix: <installdir>/tables/<temsname/ATTRLIB

Windows: <installdir>\cms\ATTRLIB

The Catalog files are taken from the  hub TEMS environment

Linux/Unix: <installdir>/tables/<temsname/RKDSCATL

Windows: <installdir>\cms\RKDSCATL

The Situation definition is taken either directly from the TEMS database tables TSITDESC and TNAME or indirectly from the Situation Audit project run with the -a option introduced at level 1.25000. Situation Audit data will provide a report that has fewer false advisories. The EIB tables used directly sometimes identify attribute group names incorrectly because they have approximately the correct form. Situation Audit is more precise because it performs a  full syntax analysis. In actual usage either will do.

The first step will be removing unused catalog and attribute files. After that the number of advisories messages in the report will be sharply reduced.

The three data sources do not have to be used in place. You can create the data and afterwards copy it to another location for processing. You do not have to achieve perfection although removing the high impact advisories will definitely improve ITM processing reliability. Performance is not expected to change much.

Package Installation

This document uses the default install directory however you can make any wanted.

Linux/Unix systems come with Perl installed. Windows may need it installed and I use http://www.activestate.com/activeperl, community edition 5.20. No CPAN modules are needed for this package. It will likely work on many different levels. As time goes on the project will be upgraded to modern levels about once a year in the late fall.

The package is  atrhealth.0.83000. It contains

1) A Perl program atrhealth.pl and control file atrhealth.ini – standing for Attribute and Catalog Health Survey.

2) If you use the Situation Audit capture of the sit_atr.txt file, the following files can be ignored.

3) A Windows atrsql.cmd file to run the SQL statements

4) A Linux/Unix atrsql.tar file that contains the atrsql.sh file. This avoids problems with line endings. To use untar atrsql.tar into the target directory.

5) The cmd and shell files require manual updating if the install directory is not the default.

I suggest these all program objects be placed in a single directory. For Windows you can create the tmp directory and sql subdirectory. For Linux/Unix create the sql directory.

Linux/Unix:  /opt/IBM/IBM/tmp/atrhealth

Windows: c:\IBM\ITM\tmp\atrhealth

You can run this in any directory, of course.

Configuring the Attribute Health Survey Program – Initialization file

Create the atrhealth.ini file. Here is an example where the sit_atr.txt will be used.

sit_atr qa1\sit_atr.txt

attrlib atr

rkdscatl cat

sit_atr: the data supplied is the filename. In this case there is a sub-directory qa1 and the file is in that directory. This is from Windows and so the backslash character is used.

attrlib: the data supplied is a directory where all the attribute files are stored.

rkdscatl: the data supplied is a directory where all the attribute files are stored.

These can be specified as fully qualified file names to use the existing files like this

attrlib C:\IBM\ITM\cms\ATTRLIB

If the Situation data is supplied by the EIB capture, the atrhealth.ini looks like this [# is a comment character]

#sit_atr qa1\sit_atr.txt

attrlib atr

rkdscatl cat

The two EIB capture files must be in the current directory and have the name

QA1CSITF.DB.LST

QA1DNAME.DB.LST

and they should be identified automatically. If there is any confusion you can invoke atrhealth.pl with the -lst option.

Getting the Situation/Attribute Data

For the Situation Audit case install that package and use it with the -a option.

Following shows how to get the data from the EIB using supplied SQL using the atrsql.cmd or atrsql.sh files. Here is an example where the work is being done in the existing default tmp directory for Linux/Unix where the TEPS is running. If the product is not installed in the default directory. set the environment variable

Linux/Unix

a) copy atrsql.tar to /opt/IBM/ITM/tmp

b) untar -xf atrsql.tar

c) If not using default install directory configure like this: export CANDLEHOME=/opt/IBM/ITM

d) sh datasql.sh

d) The two files are created and should be moved to where the survey will be done

Here is an example where the work is being done in the existing default tmp directory for Windows where the TEPS is running.

Windows

a) c:

b) cd c:\IBM\ITM

c) md tmp

d) cd tmp

e) move the atrsql.cmd to this directory

f) If not using default install directory configure like this: SET CANDLE_HOME=c:\IBM\ITM

g) atrsql.cmd

h) The two files are created and should be moved to where the survey will be done

Running the Attribute and Catalog Health Survey

Linux/Unix

a) Following the preceding step the two files QA1CSITF.DB.LST QA1DNAME.DB.LST are already present in /opt/IBM/ITM/tmp

b) create a file atrhealth.ini  like this

attrlib /opt/IBM/ITM/tables/<temsname>/ATTRLIB

rkdscatl /opt/IBM/ITM/tables/<temsname>/RKDSCATL

c) copy the atrhealth.pl program here and run the program

perl atrhealth.pl -lst

Windows

a) Following the preceding step the two files QA1CSITF.DB.LST QA1DNAME.DB.LST are already present in C:\IBM\ITM\TMP

b) create a file atrhealth.ini  like this

attrlib C:\IBM\ITM\cms\ATTRLIB

rkdscatl C:\IBM\ITM\cms\RKDSCATL

c) copy the atrhealth.pl program here and run the program

perl atrhealth.pl -lst

The result will be three files:

– atrhealth.csv  health survey report

– atrunused.csv  list of atr and cat files unused

– atrused.csv list of atr and cat files which are used

Screen shot of Attribute and Catalog Health Survey Report

The beginning of the report contains the version number and a count of the number of messages. That is followed by the advisory messages.

Atrhealth1

Following is the advisory message documentation.

Advisory code: ATRHEALTH1000E

Text: Attribute group name in sits[$sits] not found in attribute files

Severity: 100

Check: For every Attribute Group used in a situation, it should be defined in an attribute file.

Meaning: This is sometimes a false positive when using data directly from the EIB. For example if a Situation Formula contained “12.50”  the first three characters might be mis-recognized as an attribute group. This does not occur when situation/attribute data is gotten from Situation Audit.

However if this is not the case, that means the situation will not be processed correctly.

Recovery plan: Install the needed attribute and catalog files and restart the TEMS [needed on all hub/remote TEMSes]. If the situation is no longer needed, delete it. If the situation is not autostarted, it could be ignored.

Advisory code: ATRHEALTH1001E

Text: Catalog key from Attribute table $atable in [$pfns] unknown in catalog files.

Severity: 100

Check: For every Attribute Group there should be a related catalog file that defines the application and table name.

Meaning: This strongly suggests the attribute and catalog files are not installed correctly. It could mean that associated situations will not run correctly.

Recovery plan: Review the related attribute file and see what the catalog file should be. If necessary, reinstall the application support.

Advisory code: ATRHEALTH1002W

Text: Attribute group in fn[$pfns] unused in situations

Severity: 50

Check: For every Attribute Group used in a situation, check if it is used in a situation.

Meaning: This could mean the attribute group and related catalog file are unused and can be deleted. However it might be an attribute group only used in TEP workspace real time views or where situations will be created in the future.

Recovery plan: Review the attribute files and delete attribute and catalog files if not needed.

Advisory code: ATRHEALTH1003W

Text: Catalog table in fn[$pfns] unused in situations.

Severity: 50

Check: For every catalog file determine if the related attributes are used in any situation.

Meaning: This could mean the catalog file and related attribute files are unused and can be deleted. However it might be an attribute group only used in TEP workspace real time views or where situations will be created in the future.

Recovery plan:Review the catalog files and delete attribute and catalog files if not needed.

Advisory code: ATRHEALTH1005W

Text: duplicate Attribute group in files [$pfns]

Severity: 25

Check: For every Attribute Group check for duplicates

Meaning: This most often a remnant of Universal Agent or Agent Builder catalog files.

Recovery plan: Delete duplicate attribute files which are unused. This will avoid future problems with too many catalog/attribute files.

Advisory code: ATRHEALTH1006W

Text: duplicate Catalog files in files [$pfns]

Severity: 25

Check: For every Catalog file check for duplicates

Meaning:This most often a remnant of Universal Agent or Agent Builder catalog files.

Recovery plan: Delete duplicate catalog files which are unused. This will avoid future problems with too many catalog/attribute files.

Advisory code: ATRHEALTH1007W

Text: Invalid Attribute run_name at line $ll in attribute file $onefn

Severity: 40

Check: For every attribute entry check for both attribute group name and attribute name

Meaning: This was spotted in one product provided attribute file [kmc.atr]

Recovery plan: Probably nothing to worry about

Don’t Panic!!!

The first time you run the report you may see many many advisories. Remember that the higher impact ones are the most important.

Most of the advisories will be related to leftover duplicates. Eliminating them will avoid future problems.

Rerun the report after making corrections. Then work through the Impact 100 Advisories. You do not need to clear up every single issue immediately..

After correcting the hub TEMS, you will need to fix the catalog and attribute files on all the remote TEMS [and FTO backup hub TEMS].

Next Step: Use Portal Client

When you think this process is complete, use the Portal Client to evaluate all the catalogs in the TEMSes. That is easily viewed in a Portal Client session. From the Enterprise navigation node

1) right click on Enterprise navigation node

2) select Managed Tivoli Enterprise Management Systems

3) In bottom left view, right Click on workspace link [before hub TEMS entry] and select Installed Catalogs

4) In the new display on right, right click in table, select Properties, click Return all rows and OK out

5) Resolve any missing or out of data application data. You can right-click export… the data to a local CSV file for easier tracking.

It is not always required to make things perfect. For example if an agent connects to only some remote TEMSes, then only the hub TEMS and those remote TEMS need the catalogs. However cases where the dates are different definitely need correction. In general correction means installing the correct application support.

When you have made all those right repeating the Attribute and Catalog survey one last time will increase confidence in the environment.

Summary

This report shows problems Attribute and Catalog files. This will make the ITM environment work more reliably.

Sitworld: Table of Contents

History and Earlier versions

If the current version of the Attribute and Catalog Health Survey tool does not work, you can try previous published binary object zip files. At the same time please contact me to resolve the issues.  If you discover an issue try intermediate levels to isolate where the problem was introduced.

atrhealth.0.83000

Improved parse_lst object and handle pre-built situation_attribute table

Photo Note: Big Sur Sunset – 15 January 2000

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: