MVS/ESA SP 5.1 simplifies the definition, control, and reporting of the performance requirements for MVS workloads. It provides a direct way to specify performance goals for work. (1)

To reduce the complexity of managing system resources, MVS workload management provides goal-oriented dynamic resource management. Using the MVS Workload Manager, WLM, an installation defines performance goals for CICS, IMS, JES, APPC/MVS, TSO/E, Distributed DB2, and OpenEdition MVS work on the basis of business importance. One objective of workload management is to use these goals for dynamically adjusting access to processor and storage resources. This paper discusses general recommendations and guidelines for setting goals for the different types of work installations have running on their MVS/ESA systems.

2.0 Overview of WLM external controls

An installation can gather MVS work together into a new grouping called a service class. Users can think of a service class as being similar to the current construct of performance group. Like performance groups, service classes have periods. However, this is where the similarity stops. Performance group periods are assigned various resource-based controls for such resources as processor, storage, and MPL. These static controls, as specified in the installation's IEAIPSxx and IEAOPTxx members of PARMLIB, can be thought of as all the 'knobs' and 'dials' that allowed an installation to control how and to whom certain resources are allocated.

A primary goal of the MVS Workload Manager is to simplify the management of MVS systems by providing externals to reflect an installation's expectation for work being processed in the system. To allow this, the MVS Workload Manager enables the installation to explicitly state to MVS the service objective, or goal, towards which the work should be managed. These goals are assigned to the periods of each service class.

2.1 Goal types

The MVS Workload Manager supports four different goal types. As stated previously, goals are assigned to service class periods. Each goal type has a unique meaning and implication if used. The four goal types are as follows:

Response time goal

Average response time goal
Percentile response time goal

Discretionary
Velocity
System Goal

2.1.1 Response Time Goal

A response time goal is the simplest goal to understand. This goal type can be based on either the average response time of ended transactions, or a response time target for a given percentage of completions (eg, 80% of CICS banking transactions complete within 1 second).

The primary difference between an average response time goal and a percentile response time goal is that an average response time goal is HEAVILY influenced by 'outliers'. That is, a single transaction which has gone amiss can make an average terrible. For example, assume 99 transactions all end in 1 second, but one transaction runs for 2 minutes. The average is over 3.5 seconds, even though 99% completed in 1 second. When considering using an average response time goal, decide if it is ok for WLM to manage the work based on the worst behaving transactions.

There are 2 reasons for using an average response time goal, if only on an interim basis. First, service level objectives (or Service Level Agreements) may already state an average response time objective. Much of this is rooted in the fact that most performance monitors today report average response time for ended transactions for a given grouping of work. Thus, average response time objectives can already be easily tracked and understood.

The other reason to specify an average response time goal is as a 'starting point' that will eventually lead to an appropriate percentile response time goal. Since outliers occur in almost all measurements, a percentile response time goal is probably always better to use rather than an average response time goal. The difficulty with this rule of thumb is that if an installation does not have a service objective that it is just passing along to the MVS Workload Manager, then the installation might not have enough information about the distribution of completions to be able to make an intelligent choice for the percentile value. In this case, the installation can start with a goal using the average response time currently being achieved by the workload. This average response time is currently reported by the installation's RMF performance monitor. Once the installation implements MVS Workload Manager goal mode, the RMF performance monitor will show distributions of response times around the average response time goal. The installation can then use this response time distribution to determine whether the workload has a normal distribution, or some anomalies, and therefore decide if a percentile value is more reasonable at 70% or 85% or even 95%.

2.1.2 Sometimes response time goals are not appropriate

Whenever there is an opportunity to use a response time goal, that should be the choice. However, an installation must consider that a response time goal is not appropriate for all types of work. There are 2 primary considerations that must be taken into account:

The frequency of completions
Variability of queue time

The work being assigned a response time goal must have sufficient completions. As a rule of thumb, expect at least 10 completions in 20 minutes for a response time goal to be effective.

There is a cross-over as time increases where one would not expect to see lots of completions. For example, most installations have a lot of short running transactions that end in a 20 minute period of time with each one having a completed response time of only 1 second. Most OLTP and interactive TSO transactions fit this description. These same installations have a smaller number of other transactions where each transaction takes over an hour to complete. Examples of these transactions would be CICS, IMS, or DB2 regions, started tasks, and long running batch jobs.

The first workload type with frequent short completions would be very suitable for a response time goal. The MVS Workload Manager would be able to base its decisions to 'give to' or 'take from' transactions in progress based on what was achieved by the transactions that completed recently. The second workload type would have a less statistically valid number of completions, making it difficult or even impossible for the MVS Workload Manager to make the same type of projections.

The second consideration is that the MVS Workload Manager includes queue time when managing towards a response time goal. That is, the response time for a completed batch job includes both the time the transaction was queued waiting for an initiator, and the time the transaction was actually executing. Thus, a response time goal is not appropriate when the work being assigned the response time goal has a variable and lengthy queue time. For example, if a batch job is submitted in the morning, held all day, and released at night, the MVS Workload Manager considers the transaction's completed response time to include the time held all day as well as the time the job spent executing at night. The installation should not assign a response time goal for this type of job or job class.

Instead, an installation probably wants the job or class held for a while, then executed at a certain speed once it becomes eligible to run. This concept will be discussed in the section entitled 'Velocity Goals'.

2.1.3 Short response time goals versus long response time goals.

It should be noted that WLM attempts to achieve response time goals differently based on whether or not the response time goal is short (20 seconds or less) or long (greater than 20 seconds).

When a service class period has a short response time goal, WLM assumes these transactions will not be around very long. That means there will not be much time to sample these transactions to decide how to handle them on an individual basis. No individual storage access controls or policies are even contemplated. Instead, newly arriving transactions are controlled by period-wide central storage and expanded storage policies.

When a service class period has a goal other than a short response time goal, the Workload Manager assumes each transaction will be around for a while. Therefore it looks at all the address spaces in the period to see whether protective or restrictive storage targets are needed for them, and then will also decide for each address space how to handle its access to expanded storage.

2.1.4 Discretionary Goals

A discretionary goal means that an installation wants MVS to run the work when there are resources left over after running all the other work with non-discretionary goals.

It is very likely that every customer is already familiar with discretionary work. It is the work that usually runs in the lowest Mean time to wait group of the IPS in a domain whose MPL level fluctuates based on available capacity. What customers have been doing for years is telling SRM "put this collection of work together and run it as you see fit". This is exactly the collection of work an installation migrating to MVS Workload Manager goal mode would want to assign a discretionary goal.

SRM will continue to manage this discretionary work according to the mean time to wait algorithm. That is, work which is CPU intensive will be assigned a lower dispatching priority than I/O intensive work. This is still valuable for increasing throughput.

In addition, jobs with a discretionary goal are still candidates for individual storage control via Working Set Management.

2.1.5 Velocity Goals

Certainly there is MVS work which is not discretionary, yet it cannot be given a response time goal due to the infrequent number of completions. To address this there is a need for a goal that basically states "When this work is ready, be sure it runs without delays", or "When that work is ready, keep it plodding along to ensure it will eventually finish".

The third goal type, velocity, supports both of these needs, as well as gradations in between. Velocity is a measure of the acceptable processor and storage delays while work is capable of running.

It should be noted that the delays considered in the velocity calculation are only those delays that WLM has some control over. Specifically, I/O delays at a control unit or device are not part of the velocity calculated for work. Mount delays and operator delays are also not part of the WLM velocity.

2.1.6 When velocity goals are appropriate

A velocity goal is appropriate for a particular type of work when a response time goal and a discretionary goal are not. When SRM does not see any completed or in-flight transactions in a 20 minute interval, it has no "real time" completion data to use to project whether current transactions will meet the response time objectives. This decreases the effectiveness of a response time goal. What kind of decision would an installation want SRM to make about allocating processor and storage access? A velocity goal tells SRM the extent to which delays are acceptable.

Similarly, if there are only a few completions in 20 minutes, the response time data collected by SRM could become skewed by the 'outliers' described in the discussion of average response time above. When a group of work does not have at least 10 transactions completing within a 15-20 minute interval, installations are able to tell SRM how to allocate resources by using a velocity goal.

It should be noted that velocity does not correspond to the old dispatching priority control. It is not guaranteed that a service class period with a high velocity goal will necessarily have a higher dispatching priority than another period with a lower velocity goal. For that reason, installations should not expect a significant difference in work execution by making minor adjustments to velocity goals, as might be expected by making small changes to dispatching priority in the IPS.

2.1.7 System Goals

The MVS Workload Manager can handle some work by default, without requiring a customer to bother setting externally specified goals. These 'system' goals simply provide static ways for MVS to treat certain recognized types of work. There are 3 predefined service classes that are managed according to these system goals. These service class names cannot be explicitly specified in any classification rules, but are instead service classes to be assigned in the absence of rules.

SYSTEM

When selected address spaces are created, they are assigned constantly the highest dispatching priority (255) and are excluded from storage isolation controls. These include MASTER, SMF, CONSOLE, CATALOG, GRS, RASP, XCFAS, SMXC, IOSAS, DUMPSRV, ANTMAIN, JESXCF, ALLOCAS, IXGLOGR and WLM. It is best not to assign a service class to these high dispatching priority address spaces, but to allow them to be managed within the SYSTEM service class.

SYSSTC

This service class is for all started tasks not otherwise associated with a service class. Effectively exploiting this service class is described in the section of this paper entitled 'Setting Goals for Started Tasks and System Spaces'.

Address spaces managed in SYSSTC service class are given a dispatching priority of 253. Remember that the MVS dispatcher allows multiple address spaces to share the same priority without the fear of the one which started first locking out all others. An advantage of putting selected address spaces in SYSSTC is that SRM will not have to spend time analyzing their state samples and comparing them to a goal. This is especially valuable since most installations probably do not have much of a goal for certain critical address spaces other than "be sure to run these when they are ready". SYSSTC is probably appropriate for JES, VTAM, etc.

A disadvantage of putting a started task into SYSSTC is that without a goal, storage isolation will not be invoked for the started task unless cross memory page faults in the space are impacting other work with goals.

SYSOTHER

This service class is intended as a 'catcher' for all address spaces other than started tasks that an installation has not bothered to classify. It is assigned a discretionary goal.

2.2 Importance

When multiple goals are defined, it is necessary to have a way to prioritize which of those goals are really critical, and which are only wishful thinking. MVS supports this through an importance value associated with a goal. Each goal can be rated as very important to the business (1), down to a goal that is desirable but can be sacrificed readily (5). The absolute value specified is less meaningful than the relative value of one importance compared to that of other goals.

2.2.1 Importances have several purposes:

They identify the critical goals to WLM. WLM attempts to satisfy all importance '1' goals before going after the goals at importance '2', then '3' etc.
They help prevent a user from getting into trouble. A user can set goals that are too aggressive. The importance allows WLM to make trade-offs that protect the really critical work.
They allow WLM to react to changing capacity. Reacting to an outage at many installations today involves the cancellation of some work, re-prioritization of other work, and reallocation of the remaining resources. WLM will use the importance of the goals to decide immediately which of the remaining work can donate resources needed by the work with higher importance goals.

If scarce resources are preventing work from achieving goals, WLM will not just select one type of work to pick on. It will try to achieve the goals of higher importance by degrading equally the work whose goals have lower importance.

It should be noted that importance does not correspond to the control of dispatch priority. It is not guaranteed that a service class period with high importance will necessarily have a higher dispatching priority than another period with a lower importance. Importance describes the significance of meeting a goal; it says nothing about how easy or difficult that goal may be to achieve. For example, it might be very important to an installation that a job which runs for many hours continue to execute occasionally throughout the day (Velocity may be only 5%, but importance 1). SRM may find that a low dispatching priority can satisfy that goal.

2.2.2 Resource consumption controls

In addition to the 4 goal types above, one further type of control is available with WLM. That is the ability to tell MVS to explicitly control the CPU access for a given collection of work. This can be stated as either a maximum or a minimum amount of CPU resource per second, which should be made available to all the work combined into a Resource Group.

The units of capacity are the same units that have been familiar for years as the SRM constant for various processors. Therefore it is very easy to tell MVS: "Don't let the service units consumed by this workload exceed the rate that could be captured using half of a model 9672-R11". A customer may actually be running that work across 3 LPAR images on 2 separate CECs, neither of which are 9672s!

Footnotes:

(1)CICS/ESA(TM), Database 2(TM), Distributed Relational Database Architecture(TM), DB2(TM), IMS/ESA(TM), MVS/ESA(TM), MVS/SP(TM), OpenEdition(TM), RMF(TM), VTAM(TM), are trademarks of the International Business Machines Corporation. IBM® is a registered trademark of the International Business Machines Corporation. .* The information contained in this paper has not been submitted to any formal IBM test and is distributed on an "as is" basis without any warranty either expressed or implied. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment.

[ Table of Contents | Next ]

Effective Use of MVS Workload Manager Controls

CONTENTS

SYSTEM

SYSSTC

SYSOTHER