Sitworld: Simple Network Testing

orchid

John Alvord, IBM Corporation

jalvord@us.ibm.com

Draft #1 – 6 August 2020 – Level 0.50000

Follow on twitter

Inspiration

Benchmarking ITM communication links is a tough job. You can validate many aspects using this document

ITM Communications Validation

But testing for throughput and capacity and errors is tough. Happily a former ITM developer implemented a technique to do much of this work. The rest of this document explains the process. It is documented for Linux/Unix environments.

Introduction

These “SimpleNetworkTests” are intended to measure the theoretical memory bandwidth of a single machine and the effective network bandwidth between two machines. Using product distributed binaries, a bulk-data transfer is performed using Tivoli Basic Services. This allows measurement of the customer’s IO subsystem capacity available to Basic Services’ clients: TEMS, TEPS, TEMA, and WHP.

Tivoli Monitoring data moves between TEMA and TEMS using Basic Services’ RPC calls. These simple network tests also use Basic Services’ RPC calls to perform “bulk-data transfer” but these tests do so ‘outside’ of the Tivoli Framework Management Server and Agent processes. Like communication issues discovered in BOTH the simple network tests and the customer’s TEMS and TEMA processes implicates the external I/O subsystem. Communications issues found in the TEMS (for example, a takeSample failure) with no corresponding issues in the “Bulk-Data transfer” simple network tests are assumed to be rooted in the Tivoli Framework.

Issues in the customer’s IO Subsystem become visible when the performance of these Simple Network tests are seen to vary depending on direction of the data flow. The ratio of the network transfer rate over the theoretical bandwidth for a system should be the same in both directions (measured inbound bandwidth should equal measured outbound bandwidth). Asymmetric results usually indicates IP routing malfeasance, mis-matched or biased buffer configurations, or MTU inconsistencies.

Install, Configuration and Use

SimpleNetworkTests v630 are performed using existing Operating System commands, Tivoli Monitoring binaries and shared libraries resident on the customer platform to minimize security and failure risks. Using the “itmcmd execute” command on the customer platform results in a run-time environment, identical to and composed of the installed Tivoli Monitoring software.

The TESTS and the expected results

kdcexed is the server daemon process. Like the TEMS, kdcexed listens for client connections and receives data. In these simple network tests, kdcexed is always first and it is run as a background process. kdcexer is the client process. Like the TEMA, kdcexer connects to the kdcexed server and sends data. In these simple network tests, kdcexer is started with an integer parameter and run in the forground.

LOOPBACK adapter tests. These tests are performed on a local system. Packets are exchanged on the loopback device only.

  1. Base connection test verifies local connectivity and is performed with two commands as root on the TEMS:
    1. $CANDLEHOME/bin/itmcmd execute ux “kdcexed &” to start the server daemon, then
    2. $CANDLEHOME/bin/itmcmd execute ux “kdcexer 0” to send a single client packet to the server. kdcexer launches the simple network tests client with a parameter, “0” , telling the client to stop the server. If both kdcexed and kdcexer commands complete with zero return code, the run-time environment is correctly established.
  2. Bulk-data transfer tests establish the machine’s theoretical bandwidth and is performed with these two commands root on the TEMS:
    1. $CANDLEHOME/bin/itmcmd execute ux “kdcexed &” to start the server daemon, then
    2. $CANDLEHOME/bin/itmcmd execute ux “kdcexer [1,2, … , N]” to start “N” client threads, where each thread sends 2 GBytes of data to the server. The SUM of the data transmitted IN and receive OUT divided by the wall-clock time establishes the theoretical bandwidth of the machine

BULK-DATA transfer tests. These tests are performed across the network, between two machinces using ITM binaries kdcexed and kdcexer . In this section, we are examining the theoretical bandwidth between two machines. We will use the tags “sending TEMS” and “receiving TEMS” , with the understanding that kdcexer runs on the “sending TEMS” and kdcexed runs on the “receiving TEMS”.

  1. Base connection test verifies and is performed with two commands as root on the TEMS:
    1. $CANDLEHOME/bin/itmcmd execute ux “kdcexed &” to start the server daemon, then
    2. $CANDLEHOME/bin/itmcmd execute ux “kdcexer 0” to send a single client packet to the server. kdcexer launches the simple network tests client with a parameter, “0” , telling the client to stop the server. If both kdcexed and kdcexer commands complete with zero return code, the run-time environment is correctly established.
  2. Bulk-data transfer tests establish the network’s theoretical bandwidth and is performed with these two commands root on the TEMS:
    1. $CANDLEHOME/bin/itmcmd execute ux “kdcexed &” to start the server daemon, then
    2. $CANDLEHOME/bin/itmcmd execute ux “kdcexer [1,2, … , N]” to start “N” client threads, where each thread sends 2 GBytes of data to the server. The SUM of the data transmitted IN and receive OUT divided by the wall-clock time establishes the theoretical bandwidth of the machine

Transaction tests. These tests are performed across the network, between two machinces using ITM binary kdh1 . kdh1 launched without any parameters performs as an http server.

    An http server daemon can be instantiated with the command

  • $CANDLEHOME/bin/itmcmd execute ux “kdh1 &”
    An http client can be launched with the following command:

  • $CANDLEHOME/bin/itmcmd execute ux “kdh1 -i http_client_requests.urls” where file $CANDLEHOME/http_client_requests.urls contains a list of http client requests:

Interpreting test results

the loopback logs for 519c4lp6 are these:

  • 519c4lp6_ux_kdcexed_5796eaca-01.log
  • 519c4lp6_ux_kdcexed_5796eaca-02.log
  • 519c4lp6_ux_kdcexed_5796eaca-03.log
  • 519c4lp6_ux_kdcexed_5796eaca.bandwidth.txt is aggregate
  • 519c4lp6_ux_kdcexer_5796eae1-01.log
  • 519c4lp6_ux_kdcexer_5796eae1-02.log
  • 519c4lp6_ux_kdcexed_5796eaca.bandwidth.txt is aggregate

The sending process, kdcexer, xmits 21,000 1K blocks (21 Meg) in 5 seconds. This is seen in 519c4lp6_ux_kdcexer_5796eae1.bandwidth.txt . The receiving process, kdcexed, receives 21,000 1K blocks (21 meg) in the same 5 seconds. This is seem in 519c4lp6_ux_kdcexed_5796eaca.bandwidth.txt . [ (21 MBytes in + 21 MBytes out) / 5 seconds ] = 8.4 MBytes/sec

the loopback logs for bl59lp5 are these:

  • bl59lp5_ux_kdcexed_5796c98d-01.log
  • bl59lp5_ux_kdcexed_5796c98d.bandwidth.txt is aggregate
  • bl59lp5_ux_kdcexer_5796c9a4-01.log
  • bl59lp5_ux_kdcexer_5796c9a4.bandwidth.txt is aggregate

This machine is faster. The entire 42 MBytes is moved in 3 seconds, giving us a rate of 14 MBytes / sec.

The network logs of bl59lp5 sending to 519c4lp6 are these:

  • bl59lp5_ux_kdcexer_5796f2dd-01.log
  • 519c4lp6_ux_kdcexed_5796eefb-01.log
  • 519c4lp6_ux_kdcexed_5796eefb-02.log
  • 519c4lp6_ux_kdcexed_5796eefb-03.log

bl59lp5_ux_kdcexer_5796f2dd.bandwidth.txt shows we transferred 21 Meg in 4 seconds, giving a rate of 5 MBytes/sec.

The network logs of 519c4lp6 sending to bl59lp5 are these:

  • 519c4lp6_ux_kdcexer_5796eccb-01.log
  • 519c4lp6_ux_kdcexer_5796eccb-02.log
  • bl59lp5_ux_kdcexed_5796ecad-01.log

In either direction, the 21 MBytes was transferred in 4 seconds for an effective transfer rate of 5 MBytes/sec machine-to-machine.

Notes

  • KBS_DEBUG=N and KDC_DEBUG=Y is the trace level required for all Simple Network Tests.
  • the aggregate, ‘bandwidth’ reports are generated by grepping the RAS1 logs for “(1024)” and re-directing the output from all logs of a specific process instance (“xxx…-01.log” , “xxx…-02.log ” , … “xxx…-NN.log”) to file ‘*.bandwidth.txt’ . Insert CR/LF using workpad on ‘*.bandwidth.txt’ and save as “Text document – MS DOS format”.

 

Summary

This tool and process will ease the effort of detecting and resolving duplicate agent name issues. This action will improve monitoring, reduce TEMS impact, reduce human confusion and help TEPS performance. The benefit is well worth the effort.

Sitworld: Table of Contents

History and Earlier versions

0.50000 – Initial publish

Photo Note: Orchids Galore on the Kitchen Counter

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: