Effective Use of MVS Workload Manager Controls
Ed Berkel
Peter Enrico
IBM Corporation
522 South Road
Poughkeepsie, NY 12601-5400
The HTML version of this document was created on 11/17/95 by wjmorsch@vnet.ibm.com
CONTENTS
1.0
Itroduction
2.0
Overview of WLM external controls
2.1
Goal types
2.1.1
Response Time Goal
2.1.2
Sometimes response time goals are not appropriate
2.1.3
Short response time goals versus long response time goals.
2.1.4
Discretionary Goals
2.1.5
Velocity Goals
2.1.6
When velocity goals are appropriate
2.1.7
System Goals
2.2
Importance
2.2.1
Importances have several purposes:
2.2.2
Resource consumption controls
3.0
The Heart of Workload Manager
3.1
Address space sampling
3.2
Maintaining enough history
3.3
Performance Index
3.4
Plots
3.5
Choosing and helping a receiver
3.5.1
Handling servers
3.5.2
Managing towards resource group constraints
4.0
General Recommendations for Setting Goals
4.0.1
What to use as a basis for goals?
4.0.2
Current IPS/ICS Setup
4.0.3
Service Level Objectives
4.0.4
Historical Data
4.0.5
Understand the definition of a transaction when setting a response time
goal
4.0.6
Do not try to force certain resource allocation conditions to occur.
4.0.7
Do not set unrealistic goals.
4.0.8
Keep it simple. Do not define 'too many' service classes.
4.0.9
If something truly has no business goal, then assign it a discretionary
goal.
4.0.10
Avoid having any work classified to internal service class SYSOTHER.
4.0.11
Compile service definition on paper first
4.0.12
Do not mix transactions and address spaces in a service class
5.0
Setting Goals for Interactive TSO
5.0.1
Interactive work should have a response time goal.
5.0.2
It is probably best not to give interactive work a discretionary goal.
5.0.3
A HOTTSO class may be helpful
6.0
Setting Goals for Batch
6.0.1
A Response Time Goal can be used for Batch
6.0.2
A Velocity Goal can be used for Batch
6.0.3
A Discretionary goal can be used for Batch
6.0.4
Multiple periods are still possible for batch jobs.
6.0.5
A HOTBATCH class may be helpful.
6.0.6
A SWAPOUT class is not needed.
7.0
Guidelines for OLTP Workloads
7.1
Setting Goals for Transaction and Resource Managers
7.1.1
Assign a velocity goal for OLTP regions.
7.2
Setting Goals for On-Line Transaction Workloads
7.2.1
No goal can be set for OLTP transactions whose servers do not support the
MVS WLM services.
7.2.2
Start with simple classification rules
7.2.3
Use a response time goal for OLTP transactions.
7.2.4
Use report classes for OLTP regions, if needed
8.0
Assigning Service Classes to Started Tasks and System Spaces
8.0.1
Option 1. Collect STC into a small number of similar groups.
8.0.2
Option 2. Classify STC to individual Service class.
8.0.3
Option 3. Do not even bother classifying any started tasks.
8.0.4
Option 4. A combination of the previous approaches.
9.0
Setting Goals for Distributed DB2 Work
9.0.1
All three goal types are appropriate for DDF transactions
9.0.2
Assign DDF transactions to multi-period service classes
10.0
Setting Goals for APPC Work
10.0.1
Treat the ASCH address space as an system started task
10.0.2
Use a single period, velocity goal for APPC transactions
11.0
Setting Goals for OpenEdition/MVS Work
11.0.1
Treat the OMVS Kernel and OMVS processes as all other started tasks
11.0.2
Use multiple periods and response time goals for OMVS forked children
11.0.3
Revisit goals for first period TSO
12.0
How to use Workloads
13.0
How to use Report Classes
14.0
Considerations for Using Resource Groups
14.0.1
Resource group maximums can override goals.
14.0.2
Resource group constraints can apply to OLTP regions.
14.0.3
Resource group maximums cannot be used as a replacement for RTO.
14.0.4
Resource group maximum is different than RESET QUIESCE.
14.0.5
Resource group minimums can allocate equal access for discretionary work
14.0.6
Resource group maximums can artificially lower processor capacity
15.0
Summary
15.0.1
Can WLM do as good a job as a system programmer?
15.1
References
15.2
Acknowledgements
MVS/ESA SP 5.1 simplifies the definition, control, and reporting of the
performance requirements for MVS workloads. It provides a direct way to
specify performance goals for work. (1)
To reduce the complexity of managing system resources, MVS workload
management provides goal-oriented dynamic resource management. Using the
MVS Workload Manager, WLM, an installation defines performance goals for
CICS, IMS, JES, APPC/MVS, TSO/E, Distributed DB2, and OpenEdition MVS work
on the basis of business importance. One objective of workload management
is to use these goals for dynamically adjusting access to processor and
storage resources. This paper discusses general recommendations and guidelines
for setting goals for the different types of work installations have running
on their MVS/ESA systems.
An installation can gather MVS work together into a new grouping called
a service class. Users can think of a service class as being similar
to the current construct of performance group. Like performance groups,
service classes have periods. However, this is where the similarity stops.
Performance group periods are assigned various resource-based controls
for such resources as processor, storage, and MPL. These static controls,
as specified in the installation's IEAIPSxx and IEAOPTxx members of PARMLIB,
can be thought of as all the 'knobs' and 'dials' that allowed an installation
to control how and to whom certain resources are allocated.
A primary goal of the MVS Workload Manager is to simplify the management
of MVS systems by providing externals to reflect an installation's expectation
for work being processed in the system. To allow this, the MVS Workload
Manager enables the installation to explicitly state to MVS the service
objective, or goal, towards which the work should be managed. These goals
are assigned to the periods of each service class.
The MVS Workload Manager supports four different goal types. As stated
previously, goals are assigned to service class periods. Each goal type
has a unique meaning and implication if used. The four goal types are as
follows:
-
Response time goal
-
Average response time goal
-
Percentile response time goal
-
Discretionary
-
Velocity
-
System Goal
A response time goal is the simplest goal to understand. This goal type
can be based on either the average response time of ended transactions,
or a response time target for a given percentage of completions (eg, 80%
of CICS banking transactions complete within 1 second).
The primary difference between an average response time goal and a percentile
response time goal is that an average response time goal is HEAVILY influenced
by 'outliers'. That is, a single transaction which has gone amiss can make
an average terrible. For example, assume 99 transactions all end in 1 second,
but one transaction runs for 2 minutes. The average is over 3.5 seconds,
even though 99% completed in 1 second. When considering using an average
response time goal, decide if it is ok for WLM to manage the work based
on the worst behaving transactions.
There are 2 reasons for using an average response time goal, if only
on an interim basis. First, service level objectives (or Service Level
Agreements) may already state an average response time objective. Much
of this is rooted in the fact that most performance monitors today report
average response time for ended transactions for a given grouping of work.
Thus, average response time objectives can already be easily tracked and
understood.
The other reason to specify an average response time goal is as a 'starting
point' that will eventually lead to an appropriate percentile response
time goal. Since outliers occur in almost all measurements, a percentile
response time goal is probably always better to use rather than an average
response time goal. The difficulty with this rule of thumb is that if an
installation does not have a service objective that it is just passing
along to the MVS Workload Manager, then the installation might not have
enough information about the distribution of completions to be able to
make an intelligent choice for the percentile value. In this case, the
installation can start with a goal using the average response time currently
being achieved by the workload. This average response time is currently
reported by the installation's RMF performance monitor. Once the installation
implements MVS Workload Manager goal mode, the RMF performance monitor
will show distributions of response times around the average response time
goal. The installation can then use this response time distribution to
determine whether the workload has a normal distribution, or some anomalies,
and therefore decide if a percentile value is more reasonable at 70% or
85% or even 95%.
Whenever there is an opportunity to use a response time goal, that should
be the choice. However, an installation must consider that a response time
goal is not appropriate for all types of work. There are 2 primary considerations
that must be taken into account:
-
The frequency of completions
-
Variability of queue time
The work being assigned a response time goal must have sufficient completions.
As a rule of thumb, expect at least 10 completions in 20 minutes for a
response time goal to be effective.
There is a cross-over as time increases where one would not expect to
see lots of completions. For example, most installations have a lot of
short running transactions that end in a 20 minute period of time with
each one having a completed response time of only 1 second. Most OLTP and
interactive TSO transactions fit this description. These same installations
have a smaller number of other transactions where each transaction takes
over an hour to complete. Examples of these transactions would be CICS,
IMS, or DB2 regions, started tasks, and long running batch jobs.
The first workload type with frequent short completions would be very
suitable for a response time goal. The MVS Workload Manager would be able
to base its decisions to 'give to' or 'take from' transactions in progress
based on what was achieved by the transactions that completed recently.
The second workload type would have a less statistically valid number of
completions, making it difficult or even impossible for the MVS Workload
Manager to make the same type of projections.
The second consideration is that the MVS Workload Manager includes queue
time when managing towards a response time goal. That is, the response
time for a completed batch job includes both the time the transaction was
queued waiting for an initiator, and the time the transaction was actually
executing. Thus, a response time goal is not appropriate when the work
being assigned the response time goal has a variable and lengthy queue
time. For example, if a batch job is submitted in the morning, held all
day, and released at night, the MVS Workload Manager considers the transaction's
completed response time to include the time held all day as well as the
time the job spent executing at night. The installation should not assign
a response time goal for this type of job or job class.
Instead, an installation probably wants the job or class held for a
while, then executed at a certain speed once it becomes eligible to run.
This concept will be discussed in the section entitled 'Velocity
Goals'.
It should be noted that WLM attempts to achieve response time goals differently
based on whether or not the response time goal is short (20 seconds or
less) or long (greater than 20 seconds).
When a service class period has a short response time goal, WLM assumes
these transactions will not be around very long. That means there will
not be much time to sample these transactions to decide how to handle them
on an individual basis. No individual storage access controls or policies
are even contemplated. Instead, newly arriving transactions are controlled
by period-wide central storage and expanded storage policies.
When a service class period has a goal other than a short response time
goal, the Workload Manager assumes each transaction will be around for
a while. Therefore it looks at all the address spaces in the period to
see whether protective or restrictive storage targets are needed for them,
and then will also decide for each address space how to handle its access
to expanded storage.
A discretionary goal means that an installation wants MVS to run the work
when there are resources left over after running all the other work with
non-discretionary goals.
It is very likely that every customer is already familiar with discretionary
work. It is the work that usually runs in the lowest Mean time to wait
group of the IPS in a domain whose MPL level fluctuates based on available
capacity. What customers have been doing for years is telling SRM "put
this collection of work together and run it as you see fit". This is exactly
the collection of work an installation migrating to MVS Workload Manager
goal mode would want to assign a discretionary goal.
SRM will continue to manage this discretionary work according to the
mean time to wait algorithm. That is, work which is CPU intensive will
be assigned a lower dispatching priority than I/O intensive work. This
is still valuable for increasing throughput.
In addition, jobs with a discretionary goal are still candidates for
individual storage control via Working Set Management.
Certainly there is MVS work which is not discretionary, yet it cannot be
given a response time goal due to the infrequent number of completions.
To address this there is a need for a goal that basically states "When
this work is ready, be sure it runs without delays", or "When that work
is ready, keep it plodding along to ensure it will eventually finish".
The third goal type, velocity, supports both of these needs, as well
as gradations in between. Velocity is a measure of the acceptable processor
and storage delays while work is capable of running.
It should be noted that the delays considered in the velocity calculation
are only those delays that WLM has some control over. Specifically, I/O
delays at a control unit or device are not part of the velocity calculated
for work. Mount delays and operator delays are also not part of the WLM
velocity.
A velocity goal is appropriate for a particular type of work when a response
time goal and a discretionary goal are not. When SRM does not see any completed
or in-flight transactions in a 20 minute interval, it has no "real time"
completion data to use to project whether current transactions will meet
the response time objectives. This decreases the effectiveness of a response
time goal. What kind of decision would an installation want SRM to make
about allocating processor and storage access? A velocity goal tells SRM
the extent to which delays are acceptable.
Similarly, if there are only a few completions in 20 minutes, the response
time data collected by SRM could become skewed by the 'outliers' described
in the discussion of average response time above. When a group of work
does not have at least 10 transactions completing within a 15-20 minute
interval, installations are able to tell SRM how to allocate resources
by using a velocity goal.
It should be noted that velocity does not correspond to the old dispatching
priority control. It is not guaranteed that a service class period with
a high velocity goal will necessarily have a higher dispatching priority
than another period with a lower velocity goal. For that reason, installations
should not expect a significant difference in work execution by making
minor adjustments to velocity goals, as might be expected by making small
changes to dispatching priority in the IPS.
The MVS Workload Manager can handle some work by default, without requiring
a customer to bother setting externally specified goals. These 'system'
goals simply provide static ways for MVS to treat certain recognized types
of work. There are 3 predefined service classes that are managed according
to these system goals. These service class names cannot be explicitly specified
in any classification rules, but are instead service classes to be assigned
in the absence of rules.
SYSTEM
When selected address spaces are created, they are assigned constantly
the highest dispatching priority (255) and are excluded from storage isolation
controls. These include MASTER, SMF, CONSOLE, CATALOG, GRS, RASP, XCFAS,
SMXC, IOSAS, DUMPSRV, ANTMAIN, JESXCF, ALLOCAS, IXGLOGR and WLM. It is
best not to assign a service class to these high dispatching priority address
spaces, but to allow them to be managed within the SYSTEM service class.
SYSSTC
This service class is for all started tasks not otherwise associated with
a service class. Effectively exploiting this service class is described
in the section of this paper entitled 'Setting Goals for Started Tasks
and System Spaces'.
Address spaces managed in SYSSTC service class are given a dispatching
priority of 253. Remember that the MVS dispatcher allows multiple address
spaces to share the same priority without the fear of the one which started
first locking out all others. An advantage of putting selected address
spaces in SYSSTC is that SRM will not have to spend time analyzing their
state samples and comparing them to a goal. This is especially valuable
since most installations probably do not have much of a goal for certain
critical address spaces other than "be sure to run these when they are
ready". SYSSTC is probably appropriate for JES, VTAM, etc.
A disadvantage of putting a started task into SYSSTC is that without
a goal, storage isolation will not be invoked for the started task unless
cross memory page faults in the space are impacting other work with goals.
SYSOTHER
This service class is intended as a 'catcher' for all address spaces other
than started tasks that an installation has not bothered to classify. It
is assigned a discretionary goal.
When multiple goals are defined, it is necessary to have a way to prioritize
which of those goals are really critical, and which are only wishful thinking.
MVS supports this through an importance value associated with a
goal. Each goal can be rated as very important to the business (1), down
to a goal that is desirable but can be sacrificed readily (5). The absolute
value specified is less meaningful than the relative value of one importance
compared to that of other goals.
-
They identify the critical goals to WLM. WLM attempts to satisfy all importance
'1' goals before going after the goals at importance '2', then '3' etc.
-
They help prevent a user from getting into trouble. A user can set goals
that are too aggressive. The importance allows WLM to make trade-offs that
protect the really critical work.
-
They allow WLM to react to changing capacity. Reacting to an outage at
many installations today involves the cancellation of some work, re-prioritization
of other work, and reallocation of the remaining resources. WLM will use
the importance of the goals to decide immediately which of the remaining
work can donate resources needed by the work with higher importance goals.
If scarce resources are preventing work from achieving goals, WLM will
not just select one type of work to pick on. It will try to achieve the
goals of higher importance by degrading equally the work whose goals have
lower importance.
It should be noted that importance does not correspond to the control of
dispatch priority. It is not guaranteed that a service class period with
high importance will necessarily have a higher dispatching priority than
another period with a lower importance. Importance describes the significance
of meeting a goal; it says nothing about how easy or difficult that goal
may be to achieve. For example, it might be very important to an installation
that a job which runs for many hours continue to execute occasionally throughout
the day (Velocity may be only 5%, but importance 1). SRM may find that
a low dispatching priority can satisfy that goal.
In addition to the 4 goal types above, one further type of control is available
with WLM. That is the ability to tell MVS to explicitly control the CPU
access for a given collection of work. This can be stated as either a maximum
or a minimum amount of CPU resource per second, which should be made available
to all the work combined into a Resource Group.
The units of capacity are the same units that have been familiar for
years as the SRM constant for various processors. Therefore it is very
easy to tell MVS: "Don't let the service units consumed by this workload
exceed the rate that could be captured using half of a model 9672-R11".
A customer may actually be running that work across 3 LPAR images on 2
separate CECs, neither of which are 9672s!
Footnotes:
(1)CICS/ESA(TM),
Database 2(TM), Distributed Relational Database Architecture(TM), DB2(TM),
IMS/ESA(TM), MVS/ESA(TM), MVS/SP(TM), OpenEdition(TM), RMF(TM), VTAM(TM),
are trademarks of the International Business Machines Corporation. IBM®
is a registered trademark of the International Business Machines Corporation.
.* The information contained in this paper has not been submitted to any
formal IBM test and is distributed on an "as is" basis without any warranty
either expressed or implied. The use of this information or the implementation
of any of these techniques is a customer responsibility and depends on
the customer's ability to evaluate and integrate them into the customer's
operational environment.
[ Table of Contents
| Next ] |