Sitworld: Event History #10 lost events because DisplayItem missing or null Atoms

fittingEngine

John Alvord, IBM Corporation

jalvord@us.ibm.com

Draft #1 – 24 April 2018 – Level 1.00000

Follow on twitter

Inspiration

The Event History Audit project is complex to grasp as a whole. The following series of posts will track individual diagnostic efforts and how the new project aided in the process.

This was seen in the Event Audit History Advisory section:

25,EVENTAUDIT1010W,TEMS,Situations [1] lost events because DisplayItem missing or null Atoms – see EVENTREPORT001

One situations showed evidence that events were lost before DisplayItem was not configured or because the Atomize value was none. This is parallel to Event History #5 but from a different viewpoint.

And in that report section:

EVENTREPORT001: Multiple results in one second but DisplayItem missing or null atoms found

Situation,Type,Agent_Second,Results,Agent,Atomize,Atom,

dow_evtlog_4ntw01_bkpv1,Pure,1180421001913000,3,Primary:roh_dewfs02:NT,,,

There is a pure situation dow_evtlog_4ntw01_bkpv1. At Agent second 1180421001913000 there were 3 total results from agent Primary:roh_dewfs02:NT. There was no DisplayItem or Atomize value

Deep dive Into the report details

Scan or search ahead for Report 999. It is sorted by first node, then situation, then by Time at the TEMS. I will first describe what you see and the guidance from the column description line.

This will show only a single open event and then close event, but there were many listed in the full report.

EVENTREPORT999: Full report sorted by Node/Situation/Time

Situation,Node,Thrunode,Agent_Time,TEMS_Time,Deltastat,Reeval,Results,Atomize,DisplayItem,LineNumber,PDT

Situation – Situation Name, which can be different from the Full Name that you see in situation editor, like too long or other cases.

Node – Managed System Name or Agent Name

Thrunode – The managed system that knows how to communicate with the agent, the remote TEMS in simple cases

Agent_Time – The time as recorded at the Agent during TEMA processing. You will see cases where the same Agent time is seen in multiple TEMS seconds because the Agent can produce data faster than then TEMS can process it at times. Simple cases have a last three digits of 999. Other cases will have tie breakers of 000,001,…,998 when a lot of data is being generated. This the UTC [earlier GMT] time at the agent.

TEMS_Time – The time as recorded at the TEMS during processing. This the UTC [earlier GMT] time.

Deltastat – event status. You generally see Y for open and N for close. There are more not recorded here.

Reeval – Sampling interval [re-evaluation] in seconds and 0 means a pure event.

Results – How many results were seen. The simplest cases are 1 and you would see that if you used -allresults control. In this report you only get a warning when there are multiple results.

Atomize – The table/column specification of the value used for Atomize. It can be null meaning not used.

DisplayItem – The value of the atomize in this instance. Atomize is just the [up to] first 128 bytes of another string attribute.

LineNumber – A debugging helper that tells what line of the TSITSTSH data dump supplied this information

PDT  – The Predicate or Situation Formula as it is stored.

The Descriptor line – before we see the results.

dow_evtlog_4ntw01_bkpv1,Primary:roh_dewfs02:NT,REMOTE_nltnzsdowm018,1180421001912000,1180421001913000,Y,0,4,,,7071,*IF ( ( *VALUE NT_Event_Log.Event_ID *EQ 34080 *AND *VALUE NT_Event_Log.Source *EQ ‘Backup Exec’ *AND *VALUE NT_Event_Log.Log_Name *EQ Application *AND *VALUE NT_Event_Log.Type *NE Information ) *OR ( *VALUE NT_Event_Log.Event_ID *EQ 34113 *AND *VALUE NT_Event_Log.Source *EQ ‘Backup Exec’ *AND *VALUE NT_Event_Log.Log_Name *EQ Application *AND *VALUE NT_Event_Log.Type *NE Information ) *OR ( *VALUE NT_Event_Log.Event_ID *EQ 57476 *AND *VALUE NT_Event_Log.Source *EQ ‘Backup Exec’ *AND *VALUE NT_Event_Log.Log_Name *EQ Application *AND *VALUE NT_Event_Log.Type *NE Information ) *OR ( *VALUE NT_Event_Log.Event_ID *EQ 57477 *AND *VALUE NT_Event_Log.Source *EQ ‘Backup Exec’ *AND *VALUE NT_Event_Log.Log_Name *EQ Application *AND *VALUE NT_Event_Log.Type *NE Information ) ),

Situation was Idow_evtlog_4ntw01_bkpv1, agent was Primary:roh_dewfs02:NT. thrunode was ,REMOTE_nltnzsdowm018. Agent_time was 1180421001912000 and TEMS_time was 1180421001913000, one second later. It was an Open event [Y], sampling interval was 0 meaning a pure situation, there was four results and no DisplayItem or Atomize.. The record came from line number 7071 in the input. The PDT is shown at the end

Following the descriptor line is one or more P [Predicate/formula] lines as used as the Agent logic, followed by the results contributing to the TEMS logic.

,,,,,,,P,*PREDICATE=( ( NTEVTLOG.EVENTID = 34080 AND NTEVTLOG.SOURCE = ‘Backup Exec’ AND NTEVTLOG.LOGNAME = ‘Application’ AND NTEVTLOG.TYPE <> ‘Information’ ) OR ( NTEVTLOG.EVENTID = 34113 AND NTEVTLOG.SOURCE = ‘Backup Exec’ AND NTEVTLOG.LOGNAME = ‘Application’ AND NTEVTLOG.TYPE <> ‘Information’ ) OR ( NTEVTLOG.EVENTID = 57476 AND NTEVTLOG.SOURCE = ‘Backup Exec’ AND NTEVTLOG.LOGNAME = ‘Application’ AND NTEVTLOG.TYPE <> ‘Information’ ) OR ( NTEVTLOG.EVENTID = 57477 AND NTEVTLOG.SOURCE = ‘Backup Exec’ AND NTEVTLOG.LOGNAME = ‘Application’ AND NTEVTLOG.TYPE <> ‘Information’ ) ),

Following the predicate is three result lines and a second predicate line. These are all in the form of Attribute=value in the Table/Column=raw_data form. There is a leading count of the index of this result line. In this case there was two P lines and three result lines. 

,,,,,,,0,NTEVTLOG.CATEGORY=None;NTEVTLOG.COMPUTER=DEWFS02;NTEVTLOG.DESCRIP=Backup Exec Alert: Job Failed (Server: “DEWFS02”) (Job: “DEWFS02_FULL”) DEWFS02_FULL — The job failed with the following error: A communications failure has occurred.      For more information, click the following link:  http://eventlookup.veritas.com/eventlookup/EventLookup.jhtml;NTEVTLOG.DUPCNT=0;NTEVTLOG.ENTRYTIME=1180420231906000;NTEVTLOG.EVENTDATE=04/20/18;NTEVTLOG.EVENTID=34113;NTEVTLOG.EVENTIDSTR=34113;NTEVTLOG.EVENTTIME=23:19:06;NTEVTLOG.LOGNAME=Application;NTEVTLOG.ORIGINNODE=Primary:roh_dewfs02:NT;NTEVTLOG.RECNUMBER=498366;NTEVTLOG.SOURCE=Backup Exec;NTEVTLOG.TIMESTAMP=1180420231907000;NTEVTLOG.TYPE=Error;NTEVTLOG.UCATEGORY=None;NTEVTLOG.UDESCRIP=Backup Exec Alert: Job Failed (Server: “DEWFS02”) (Job: “DEWFS02_FULL”) DEWFS02_FULL — The job failed with the following error: A communications failure has occurred.      For more information, click the following link:  http://eventlookup.veritas.com/eventlookup/EventLookup.jhtml;NTEVTLOG.ULOGNAME=Application;NTEVTLOG.USERID=N/A;NTEVTLOG.USOURCE=Backup Exec;NTEVTLOG.UUSERID=N/A,

,,,,,,,1,NTEVTLOG.CATEGORY=None;NTEVTLOG.COMPUTER=DEWFS02;NTEVTLOG.DESCRIP=Backup Exec Alert: Job Failed (Server: “DEWFS02”) (Job: “DEWFS02_FULL”) DEWFS02_FULL — The job failed with the following error: A communications failure has occurred.      For more information, click the following link:  http://eventlookup.veritas.com/eventlookup/EventLookup.jhtml;NTEVTLOG.DUPCNT=0;NTEVTLOG.ENTRYTIME=1180420231906000;NTEVTLOG.EVENTDATE=04/20/18;NTEVTLOG.EVENTID=34113;NTEVTLOG.EVENTIDSTR=34113;NTEVTLOG.EVENTTIME=23:19:06;NTEVTLOG.LOGNAME=Application;NTEVTLOG.ORIGINNODE=Primary:roh_dewfs02:NT;NTEVTLOG.RECNUMBER=498366;NTEVTLOG.SOURCE=Backup Exec;NTEVTLOG.TIMESTAMP=1180420231907000;NTEVTLOG.TYPE=Error;NTEVTLOG.UCATEGORY=None;NTEVTLOG.UDESCRIP=Backup Exec Alert: Job Failed (Server: “DEWFS02”) (Job: “DEWFS02_FULL”) DEWFS02_FULL — The job failed with the following error: A communications failure has occurred.      For more information, click the following link:  http://eventlookup.veritas.com/eventlookup/EventLookup.jhtml;NTEVTLOG.ULOGNAME=Application;NTEVTLOG.USERID=N/A;NTEVTLOG.USOURCE=Backup Exec;NTEVTLOG.UUSERID=N/A,

Sometimes there are multiple captures bundled together in a single result row arrival. In that case you see a second P line.

,,,,,,,P,*PREDICATE=( ( NTEVTLOG.EVENTID = 34080 AND NTEVTLOG.SOURCE = ‘Backup Exec’ AND NTEVTLOG.LOGNAME = ‘Application’ AND NTEVTLOG.TYPE <> ‘Information’ ) OR ( NTEVTLOG.EVENTID = 34113 AND NTEVTLOG.SOURCE = ‘Backup Exec’ AND NTEVTLOG.LOGNAME = ‘Application’ AND NTEVTLOG.TYPE <> ‘Information’ ) OR ( NTEVTLOG.EVENTID = 57476 AND NTEVTLOG.SOURCE = ‘Backup Exec’ AND NTEVTLOG.LOGNAME = ‘Application’ AND NTEVTLOG.TYPE <> ‘Information’ ) OR ( NTEVTLOG.EVENTID = 57477 AND NTEVTLOG.SOURCE = ‘Backup Exec’ AND NTEVTLOG.LOGNAME = ‘Application’ AND NTEVTLOG.TYPE <> ‘Information’ ) ),

,,,,,,,2,NTEVTLOG.CATEGORY=None;NTEVTLOG.COMPUTER=DEWFS02;NTEVTLOG.DESCRIP=Backup Exec Alert: Job Failed (Server: “DEWFS02”) (Job: “DEWFS02_FULL”) DEWFS02_FULL — The job failed with the following error: A communications failure has occurred.      For more information, click the following link:  http://eventlookup.veritas.com/eventlookup/EventLookup.jhtml;NTEVTLOG.DUPCNT=0;NTEVTLOG.ENTRYTIME=1180420231906000;NTEVTLOG.EVENTDATE=04/20/18;NTEVTLOG.EVENTID=34113;NTEVTLOG.EVENTIDSTR=34113;NTEVTLOG.EVENTTIME=23:19:06;NTEVTLOG.LOGNAME=Application;NTEVTLOG.ORIGINNODE=Primary:roh_dewfs02:NT;NTEVTLOG.RECNUMBER=498366;NTEVTLOG.SOURCE=Backup Exec;NTEVTLOG.TIMESTAMP=1180420231907000;NTEVTLOG.TYPE=Error;NTEVTLOG.UCATEGORY=None;NTEVTLOG.UDESCRIP=Backup Exec Alert: Job Failed (Server: “DEWFS02”) (Job: “DEWFS02_FULL”) DEWFS02_FULL — The job failed with the following error: A communications failure has occurred.      For more information, click the following link:  http://eventlookup.veritas.com/eventlookup/EventLookup.jhtml;NTEVTLOG.ULOGNAME=Application;NTEVTLOG.USERID=N/A;NTEVTLOG.USOURCE=Backup Exec;NTEVTLOG.UUSERID=N/A,

The P lines and the index lines are created by this event history report for easier viewing. In raw data they are all smashed together separated by tildes [~] and often embedded blanks.

What is the problem and How to fix it?

In this case there were three identical results and two were “lost” and only one situation was created. In the full report there were hundreds of identical results. One way to reduce the impact of all the results being created, transmitted and processed would be to configure the agent to suppress duplicate results. Another question that must be asked is whether the condition is going to be fixed. Is someone going to fix this condition or understand how to avoid it in the future. If you are going to get and process alert warnings, that needs to be someone’s responsibility. If not the situation could be suppressed by altering the formula or even stopped and deleted. The best performance gain is not doing work that isn’t needed.

Summary

Tale #10 of using Event Audit History is about reviewing a case where an flood of identical events arrive at a TEMS.

Sitworld: Table of Contents

History and Earlier versions

There are no binary objects associated with this project.

1.000000

initial release

Photo Note: Will that Cruise Ship Engine Fit? 2014

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: