Search	in

Effective Use of MVS Workload Manager Controls

[ Table of Contents | Previous | Next ]

4.0 General Recommendations for Setting Goals

4.0.1 What to use as a basis for goals?

In preparing an initial set of goals, the performance analyst can exploit several different sources. The following three inputs will be discussed:

Current IPS/ICS setup
Service level objectives based on the business needs.
Historical Data

4.0.2 Current IPS/ICS Setup

An installation could use its current IPS/ICS configuration as a basis for its MVS Workload Manager service class configuration and as a basis for its Workload Manager classification rules. For example, most installations already have a multi-period performance group set up for the production TSO work. Each period is defined with a particular duration. These installations should find it very easy to set up a service class to mirror this type of TSO performance group. Other ways the IPS/ICS may help are:

Service definition coefficients

A service unit is a measure of resources consumed by an address space. Each installation can tailor, or weight, the different components of service by specifying weighting factors. Those weighting factors are the service definition coefficients (CPU, IOC, MSO, and SRB). These coefficients are specified in the IPS, and are needed in a WLM service definition as well.

An installation could just continue using the same coefficients. That might be wise if there are any billing tools used in the installation based on service units, and if those tools cannot be changed. Note that the tools may need to be changed for goal mode anyway, since presumably they are sensitive to Performance Group Number. If so, it is recommended that corresponding report classes rather than service classes be defined for these tools to be utilized in goal mode. Define service classes primarily for SRM management purposes, and report classes primarily for reporting purposes.

If there are no restrictions on the coefficients, this could be a good time to change them. The MSO coefficient has generally been carried at the default value of 3.00 This was appropriate when storage was a very rare resource. But for most installations today, the storage resource should not contribute so extensively to the total service. For several years it has been recommended to change MSO to either 0.000 or to 0.0001 It is recommended that this be done during a system's migration to goal mode.

Another change to consider now is to reset the CPU and SRB coefficients to 1.0 That way the definition of service units per second for resource groups will be directly in the terms of the CPU and SRB service units (since multiplying by a coefficient of 1 obviously would not change the resulting product).

Many installations use CPU=10.0, IOC=5.0, SRB=10.0 today. Each type of service would contribute the same proportion or weight if these three coefficients were changed to CPU=1.0, IOC=0.5, SRB=1.0 -- of course assuming the MSO coefficient is set to 0.

Durations

A service class goal is valid for a specified duration, just as performance group periods can specify durations. Since the duration is in terms of service units consumed, obviously if the coefficients of service units are changed, there must be a similar resulting change in the durations specified for multi-period service classes. Durations are probably only used in PGNs for TSO and batch work. Those will be the most likely types of service classes to have durations as well. Realize that a change to the meaning of a service unit can significantly alter the meaning of "TSO trivial" unless the corresponding duration is adjusted as well.

Insight from RMF reports

Understanding what is running in each domain and each performance group will allow the installation's RMF reports to help while preparing the WLM policy. The velocity experienced by a domain and a PGN is reported in RMF Version 5. That will be very valuable in selecting the velocity for a group of started tasks or long running batch work.

4.0.3 Service Level Objectives

Using the current IPS/ICS setup, as previously discussed, is very limited however. For instance, studying the IPS/ICS may not give any insight into CICS response times, and certainly will not provide the velocity goals for started tasks, nor IMS response times.

The installation's existing service objectives provide a good starting point for the goals to be defined to WLM. Possibly some batch classes are already documented as requiring a given turnaround time. Probably interactive work is already documented a requiring a specified response time.

There are three items to remember when looking at service level objectives as a source of information for the WLM policy:

WLM's response time goals describe system response time, without including network delays at the start or the end of the transaction.
The objectives document is probably very concise or simple, specifying just a few goals. Contrast that with the extensive number of PGNs contained in the installation's current IPS member. The WLM policies should emulate the service level objective in having a few goals, but can mirror the IPS by using report classes where extra reporting data is needed. This shift of emphasis from a large number of PGNs to a small number of service class goals is one reason no tool can, in most cases, generate an effective sample WLM policy from the IPS. Installations should expect to approach WLM with a different mind-set than they approach the IPS.
A service objective frequently defines the worst-case. That is, a customer's service agreement may state that turnaround for batch class A is 30 minutes while today the installation is delivering a 5 minute turnaround. It would probably be unwise to just pass the 30 minute service objective to WLM because end users may become irritated by such a drastic change from their expectations. Instead, use the service objectives as guidelines for setting goals, and place more emphasis on historical data.

4.0.4 Historical Data

RMF Workload Activity reports will provide a good source of information on the average response times currently being achieved for each TSO period. If an installation expects to have the same percentage of TSO transactions complete in each period in goal mode than as were completing in each period in compatibility mode then that historical data will be a good starting point for the response time goal for each of the TSO service class periods.

The RMF V5 compatibility mode Workload Activity report will also be a good source of the velocity achieved by PGNs and domains. That will provide the insight into the velocity goal to specify for service classes for started tasks and long-running batch work.

In goal mode, a batch transaction is a batch job. Most installations already have historical data on average batch elapsed time. Realize that if an installation uses PERFORM= on the JCL of individual job steps, and supports that via OPGN specifications in the ICS, then the RMF Workload Activity reports from compatibility mode should not be used to determine average response times for PGNs of batch work. That is because each step with a unique PERFORM value will constitute an individual transaction in compatibility mode, so the historical RMF data is no longer in terms of batch jobs.

Possibly an installation already has average response time data for interactive OLTP transactions via CICS Performance Monitor or IMSPARS. These could be the source of average response time goals for the WLM service definition. In addition, reference 1 documents a way to use RMF in compatibility mode to find that same data for IMS or CICS transactions.

4.0.5 Understand the definition of a transaction when setting a response time goal

This statement should be fairly obvious. One cannot specify what goal to achieve for a particular type of work if one does not know the definition for a transaction for that type of work.

It should be noted that many customers care about end user response time at a terminal. But there is nothing the base control program can do to affect network delays. Nor is there any way to synchronize the clocks for the intelligent terminals and controllers with the sysplex timer. Distributing CPU cycles and storage resources based on unsynchronized clocks is a very bad idea. Therefore, the goals specified in WLM's policies for interactive work are host response times.

TSO Transactions

A TSO transaction for WLM's policy is identical to a TSO transaction as reported via RMF for the past 20 years. The installation still has a choice via IEAOPTxx whether a CLIST should be counted as a single transaction or a collection of individual transactions.

JES Transactions

A batch job is one transaction. It starts when the job is submitted (when the JES reader processes the job), and completes when the initiator finishes executing the job. That means it does include the time queued by JES waiting for an initiator, but it does not include output processing.

CICS Transactions

A CICS transaction begins when the initial CICS region receives a message, generally from VTAM, and ends when that region returns the result to VTAM. If the transaction is routed to another CICS region (AOR, FOR, etc) the time spent processing by those other regions accumulates for the original transaction.

This transaction information is passed to WLM beginning with CICS/ESA V4.1.

IMS Transactions

An IMS transaction begins when the Control Region receives a message from VTAM, and ends when the control region passes the response back to the network.

A customer has the option to allow inserted programs (program to program switch) to be considered as new transactions, or to just be considered a continuation of an existing transaction.

IMS transaction information is passed to WLM beginning with IMS/ESA V5.1.

Distributed DB2 Transactions

A goal can be specified for a distributed DB2 transaction. This is a request arriving remotely across the network via Distributed Relational Database Architecture. Once again, the start time is the arrival time from the network and the transaction ends at the commit point when DB2 completes the distributed request.

This transaction information is passed to WLM beginning with DB2 V4.1

4.0.6 Do not try to force certain resource allocation conditions to occur.

Avoid repetitive experimentation with velocity goals in the hope of forcing a specific dispatching priority, or dispatching priority order. MVS Version 5 has changed the dispatcher significantly to provide equal access for work at the same dispatching priority. Previous concepts of what work can or cannot share the same priority may no longer be valid.

Also, the MVS Workload Manager makes resource allocation decisions dynamically throughout the day. It recognizes workload peaks and valleys on much smaller intervals of time than installations are used to controlling. Therefore, the MVS Workload Manager may determine that a given workload could manage with a lower dispatching priority during one time period of the day than the rest of the day and still meet the goal of the work.

Using RMF Monitor II, one will be able to see the dispatching priority and storage isolation targets possibly change during the day. Remember that the absolute value of the dispatching priority at any given time is not relevant. Only the relationship of one priority to others is relevant. SRM will dynamically adjust that relationship by continually assessing what resource is needed by work of highest importance that is missing goals, and see who can donate some of that resource. If the resource that is needed or is donated is CPU, the relative dispatching priorities will change.

4.0.7 Do not set unrealistic goals.

One could consider the job of the MVS Workload Manager as 'to work towards goals' rather than 'to meet all goals'. Just because one is able to specify a particular goal does not mean that the resources are available to meet that goal. The MVS Workload Manager is continually monitoring how well goals are being met. It will attempt to achieve higher importance goals before lower importance goals. If a goal is set too aggressively the MVS Workload Manager may repetitively try to help the work associated with the goal, only to determine that nothing more can be done for it (or that stealing the resources it needs from some other work is not a good trade, based on the other work's goal and importance). In such cases SRM will proceed to help other work, but cycles have been wasted trying to help work that will never meet its goal.

4.0.8 Keep it simple. Do not define 'too many' service classes.

One requirement IBM has heard repeatedly is for a simple way for accounts to define goals to MVS. This is what WLM is trying to do. It is unlikely an installation will have hundreds of different business goals. Therefore, if service classes are defined only as needed by different types of work, that will allow the definitions to MVS to be simple.

Besides the argument for simplicity, there is a good technical reason to keep the number of service classes down. SRM is deciding how to allocate system resources based on sampling. As more work is combined into a service class, there are more entities contributing samples. So there is no need to go far back into time to obtain a statistically significant set of samples. This means SRM can make better decisions more quickly, and therefore be more responsive, rather than depending on sampled data from several minutes ago.

For example, many started tasks spend over 80% of their time in an intentional timer wait. So less than 20% of the samples from those address spaces would give any insight for SRM into how to manage the dispatching priority or storage access of the work.

It may take over 10 minutes to gather sufficient samples from a single address space that was active only 20% of the time. However, if 20 similar started tasks are combined in the same service class, SRM would have a sufficient number of samples within 30 seconds to be able to decide whether an adjustment in resources is deemed appropriate. This allows SRM to be much more responsive.

What should be a typical number of service classes? Most customers will probably use around 25 service classes. This allows 4 or 5 for started task service classes, a handful for different batch classes, and plenty for the various interactive work being run by IMS or CICS or TSO or APPC, etc.

4.0.9 If something truly has no business goal, then assign it a discretionary goal.

All work with a discretionary goal will compete for processor access behind all other work with a velocity or response time goal.

Therefore, a discretionary goal will work best if a significant percentage of the processor cycles are available to work with discretionary goals. This is due to the behavior of the mean time to wait processor algorithm which will keep I/O intensive work at a higher priority than CPU intensive work.

4.0.10 Avoid having any work classified to internal service class SYSOTHER.

SYSOTHER is one of the 3 pre-defined service classes. It is assigned to certain work which a customer does not even care enough about to classify. For example, if a WLM policy does not have any rules to assign a service class to APPC transactions, when WLM sees one it will be associated with SYSOTHER and therefore have a discretionary goal.

There is no real problem with having work fall into the SYSOTHER service class. It is just an indication that the work was probably not discussed and prioritized along with the other work that was classified.

SYSOTHER will be assigned for unclassified work from the following subsystem types: APPC, Distributed DB2, JES, OPEN/MVS and TSO/E. Started tasks are not associated with SYSOTHER. See the section entitled 'Assigning Service Classes to Started Tasks and System Spaces.

If classification rules do not exist for CICS or IMS transactions (called CICS and IMS subsystem types in the WLM ISPF application), the transactions will not be managed towards the SYSOTHER discretionary goal. Instead, the serving address spaces will continue to be managed according to the goal specified for them as batch jobs or started tasks. See the discussion of servers later in this paper. (2)

4.0.11 Compile service definition on paper first

Defining goals to MVS Workload Manager is very easy using the ISPF application. But it is highly recommended that an installation have a clear definition of what they want to tell WLM, before starting the ISPF application. There are 3 good reasons for this recommendation:

The importance to assign to the goals of different service classes will probably involve discussion with different organizations or functions within the installation. Those discussions and the resulting agreement on the ranking of work are essential to a good service definition. See reference 1 for an example worksheet that might help installations in these discussions. Without a visual comparison of one goal to the next, the discussions probably could not take place.
Naming conventions should be established before starting to define any information to WLM. With a little advance thought, the installation will probably choose more meaningful names by following some conventions, than would be chosen on the spur of the moment in the ISPF application.
WLM's ISPF application tries to catch mistakes at the time they are made. For example, if service class BATCH_A is supposed to be associated with workload TEST and with resource group DEPT87, the ISPF application requires the workload and resource group to be defined before they can be referenced. The relationships can be seen very easily on paper. Then entering them into the ISPF application is a trivial data-entry task.

4.0.12 Do not mix transactions and address spaces in a service class

WLM allows goals to be specified for CICS or IMS transactions, as well as for work running in TSO or APPC address spaces, batch jobs, and started tasks. Since the CICS and IMS regions are run as batch jobs or started tasks, they will of course be associated with a service class. Do not assign the same service class to these regions as is assigned to the transactions that the regions serve.

Similarly, with DB2 V4.1, it is possible to assign goals to distributed DB2 transactions. Isolate that work into its own service class, rather than mixing it with any address space work.

The primary reason for this recommendation is that the sampling data, plots, and projections SRM assembles for each service class period can become very distorted if work of such diverse structure is combined in a service class.

Footnotes:

(2) Note there is a very brief period of time during a mode switch or during a policy activation where SRM and WLM control blocks are still being created. If SMF writes type 30 records during this time when the control blocks are not present, it will record the address space as being associated with service class SYSOTHER. So even if an installation fully classifies all work, that installation might see an occasional job associated with SYSOTHER in type 30 records.

[ Table of Contents | Previous | Next ]