Sitworld: Event History #11 Detailed Attribute differences on first two merged results

RadarDomeLift

John Alvord, IBM Corporation

jalvord@us.ibm.com

Draft #1 – 27 April 2018 – Level 1.00000

Follow on twitter

Inspiration

The Event History Audit project is complex to grasp as a whole. The following series of posts will track individual diagnostic efforts and how the new project aided in the process.

One of the largest difficulty was understanding what happened when two [or more] results were merged in a single event. There are so many attribute values to compare it can be tedious. This needed a new report section!!

This was seen in the Event Audit History Report Section

EVENTREPORT007: Detailed Attribute differences on first two merged results

Situation,Node,Agent_Time,Reeval,Results,Atom,Atomize,Attribute_Differences

bnc_wasaftergc_gynp_dseiiwas,PWNESTBN:bnc_itmaxpapnestba:KYNS,1180410062604000,300,2,KYNGCCYC.SERVER_NAM,PWNESTBA,KYNGCCYC.AF_NO 1[31178] 2[31128],KYNGCCYC.BYTE_FREED 1[1793354] 2[1786754],KYNGCCYC.BYTE_USED 1[303798] 2[310398],KYNGCCYC.FINAL_REFS 1[826] 2[1165],KYNGCCYC.GC_NO 1[31179] 2[31129],KYNGCCYC.GC_TIME 1[1180410062522599] 2[1180410062451969],KYNGCCYC.HEAP_AVAIL 1[1793354] 2[1786754],KYNGCCYC.SOFT_REFS 1[4] 2[0],KYNGCCYC.TIME_COMP 1[441] 2[569],KYNGCCYC.TIME_MARK 1[172] 2[169],KYNGCCYC.WEAK_REFS 1[340] 2[412],,

This involved the  bnc_wasaftergc_gynp_dseiiwas situation which was delivered from agent PWNESTBN:bnc_itmaxpapnestba:KYNS and the Agent time was 1180410062604000. The sampling interval was 300 seconds [Sampled situation] and there were two results merged. There was a DisplayItem KYNGCCYC.SERVER_NAM and the Atomize value was PWNESTBA. That explains why they were merged.

Attribute by Attribute comparison.

At the end of each report line is a comparison between each attribute that is different between the first and the second result rows. If there were more than two results, this comparison is still only between the first two. The idea is to make it easier to compare the two. More comments after.

KYNGCCYC.AF_NO 1[31178] 2[31128],

KYNGCCYC.BYTE_FREED 1[1793354] 2[1786754],

KYNGCCYC.BYTE_USED 1[303798] 2[310398],

KYNGCCYC.FINAL_REFS 1[826] 2[1165],

KYNGCCYC.GC_NO 1[31179] 2[31129],

KYNGCCYC.GC_TIME 1[1180410062522599] 2[1180410062451969],

KYNGCCYC.HEAP_AVAIL 1[1793354] 2[1786754],

KYNGCCYC.SOFT_REFS 1[4] 2[0],

KYNGCCYC.TIME_COMP 1[441] 2[569],

KYNGCCYC.TIME_MARK 1[172] 2[169],

KYNGCCYC.WEAK_REFS 1[340] 2[412],,

Sometimes you can spot an attribute that would make a better DisplayItem, not here though.

The KYNGCCYC.GC_TIME is really interesting – selecting out the minute and second the first is 25:22 and the second in 24:51, about 31 seconds prior. Since the sampling interval is 300 seconds, these two result sets cannot be from the same agent… even though they have the same server name KYNGCCYC.SERVER_NAM of PWNESTBA. Next notice the agent name  PWNESTBN:bnc_itmaxpapnestba:KYNS. The first section is often the hostname and this is PWNESTBN – just one character away from PWNESTBA.

What is the problem and How to fix it?

The problem is that there are two results and they are being merged – so one is lost.

From the analysis above, there are two agents which have been accidentally configured with the same name and they are conflicting with each other. They are sending results every 300 seconds. The results arrive in a large collection area identified only by agent name [and situation name and DisplayItem and time etc]. The TEMS dataserver [SQL processor] wakes up every 300 seconds and looks for results for that situation. It finds them [two in this case] and creates a potential situation event package that SITMON [situation monitor logic] then bundles together.

The solution is to review the environment and determine what the duplicated agents are and correct the incorrect agent configuration. That way the agent names will be unique and when this happens again there will be two situation events created.

Often the knowledge of a potential duplicate condition and the agent name is enough to lead the agent owners to the correct ones to fix.

Other times these can be detected with a TEPS Audit report. Agents like this often send inconsistent node statuses – like a changing IP address. The TEPS is very sensitive and complains [produces error messages] and that TEPS Audit will summarize such complaints. Other times the hub and remote TEMS needs diagnostic tracing and a TEMS Audit report will point the way.

Summary

Tale #11 of using Event Audit History is about reviewing a case where there is evidence of duplicate situation names.

Sitworld: Table of Contents

History and Earlier versions

There are no binary objects associated with this project.

1.000000

initial release

Photo Note: Radar Dome Lift – January 2016

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: