Effective Use of MVS Workload Manager Controls
[ Table of Contents
| Previous | Next
]
Deciding what goals to set for interactive TSO work will probably be the
easiest of all the different work types. This is because most customers
already have some sort of service level objectives set for this type of
work as well as lots of historical data outlining what this type of work
is already achieving in compatibility mode.
This is pretty obvious since 'interactive' implies there are real live
users at the other end of the terminals waiting for a quick response when
they hit enter. These transactions are short and quick, and enough are
completing to allow SRM to collect a reasonable statistical sample set
to base decisions on.
If a velocity goal is assigned to interactive TSO, it causes SRM to
control the swap protect time on an individual address space basis rather
than on a period-wide basis. That is less efficient. In addition, in cases
when the TSO transactions do take a while to complete, SRM will look at
the address spaces with a velocity goal to decide what expanded storage
access to give to their demand paging, VIO paging, and hiperspace paging.
This might involve monitoring address spaces for working set management
control even though the work might still end quickly. Once again, this
is less efficient than running interactive work with a response time goal.
A discretionary goal implies the installation does not mind if SRM chooses
to not run the work for a while. In such cases SRM may swap the discretionary
address space out for long periods of time if needed. When a user is sitting
at a terminal, this may be exactly what the installation intends if that
user is exceeding the amount of service the installation feels is reasonable
for a TSO user. This might be the case for the final 1% of all the TSO
transactions. So it depends what percent of the TSO work makes it to the
last service class period, and once there how the installation wants those
TSO users treated.
Installations may choose to create a special TSO service class with a more
aggressive goal and with an importance level higher than that of other
TSO users. Any installation with a special system programmers' TSO performance
group today could continue this approach with a special TSO service class.
Deciding what goals to set for batch work will be very dependant on what
type of batch work is being processed. Batch work has good uses for each
of the three goal types: response time; velocity; and discretionary.
A response time goal is most suitable for short, homogeneous batch jobs.
Today an installation may have one or more job classes to ensure fast turnaround
time for this type of batch work. To meet the objectives for this batch
work running in these job classes one would need to have enough initiators
started to those classes, so that most jobs do not have long queue times
waiting to be selected by an initiator. Using a response time goal, either
average or percentile, to help ensure this fast turnaround time is very
appropriate in this case whether the batch is test or production.
When setting a response time goal for batch work it is important to
note that a batch job is one transaction. The transaction starts when the
job is submitted (ie. when the JES reader processes the job), and completes
when the initiator finishes executing the job. That means it does include
the time queued by JES waiting for an initiator, but it does not include
output processing. Therefore, the turnaround goal specified should include
the sum of the queue time and execution time. Therefore a response time
goal should be used for batch only when sufficient initiators exist for
the associated job class such that most jobs do not experience lengthy
queue time, and where there is a steady flow of completions. A batch response
time goal would be less effective if (1) the goal is long, or (2) there
is a low rate of job completions in the service class. An average of at
least 10 batch jobs completions within a 20 minute period of time would
constitute a steady flow of completions of jobs within a service class.
A velocity goal is appropriate for long-running production or test batch
jobs, and for any IMS or CICS regions which an installation runs as batch
jobs. The assumption for these recommendations is that there will be so
few completions that a response time goal is inappropriate. Of course,
the velocity goal specified for an IMS or CICS batch job only applies for
the time when the region is not telling WLM about any transactions it is
currently serving. It is therefore a good goal for CICS or IMS regions
that are not yet upgraded to the release level supporting MVS Workload
Manager.
A velocity goal could also be very appropriate for certain jobs or job
classes which are held for a long time. When those jobs are released, the
installation may require them to be processed quickly. Since the queue
time might be unpredictable, a response time goal is inappropriate. But
the velocity goal will tell SRM how to run the work once it has been released.
If an installation can not use a velocity goal for some batch work (or
response time goal based on the caveats discussed earlier), that batch
should be given a discretionary goal. Critical path production jobs should
probably have a low velocity goal rather than discretionary, unless the
site has plenty of capacity.
Some installations currently use more than one period in the performance
groups for batch jobs, and some installations use just one period. There
are advantages and disadvantages with each approach, with implications
on the initiator structure. An installation currently using multiple period
PGNs for batch can certainly continue that in goal mode.
Most likely in the IPS, the dispatching priority decreased in each period.
In the WLM policy, the first period might contain a response time goal.
Succeeding periods could have longer response time goals, possibly with
decreased importance, with either a velocity or a discretionary goal for
the last period.
Installations may choose to create a special service class with an aggressive
goal for the case when some work "must run now". No classification rules
need to refer to that service class, but operators could still use the
RESET operator command to assign running work to that HOTBATCH service
class.
Since the RESET causes SRM to start a new transaction for the address
space, any JES queue time that may have existed is now ignored. This means
a response time goal could be used for the HOTBATCH class. But if an installation
has sufficient completions in the HOTBATCH service class, this probably
warrants a discussion with operations! Therefore, a velocity goal is probably
best for this service class.
Many installations today create a special "swap out" PGN, that causes existing
work to stop executing. There is no need to create a special service class
like that, because the RESET operator command has been extended with a
QUIESCE option. If a swappable address space is reset with that option,
it will be swapped out. If a non-swappable address space is reset that
way, it will remain in storage, but as the last item to be dispatched.
A primary feature of the MVS Workload Manager is the ability to specify
goals for on-line transactions, like CICS and IMS/TM user transactions.
Prior to the MVS Workload Manager, customers influenced the response time
for these transactions by controlling the resources provided to the transaction
and resource managers. So if a performance manager wanted to achieve a
better response time for a certain subset of the on-line transactions,
the manager had to figure out which regions processed the transactions
of interest. Then next task was to analyze the resource usage by those
regions and work at improving this resource usage in hopes of achieving
a better response time for the transactions of interest. This level of
indirectness often made it difficult to tune on-line transaction processing
workloads and often resulted in an inefficient use of system resources.
Examples of such address spaces are CICS regions, IMS control regions
and message processing regions, and DB2 address spaces.
With the advent of the MVS Workload Manager, one can now specify response
time goals for the on-line transactions. The MVS Workload Manager will
manage the transaction and resource managers to work towards meeting the
goals of the transactions they serve.
This feature is not without its caveats. The MVS Workload Manager is
only able to manage these transaction and resource managers if they exploit
specific MVS workload management services provided in MVS/ESA SP 5.1. As
of the date of this paper, the following products are exploiting these
services.
-
CICS/ESA V4.1 and higher
-
IMS/ESA TM V5.1 and higher
-
IMS/ESA DB V5.1 and higher
-
DB2 V4.1 and higher (for distributed DB2 transactions)
Since the OLTP regions are long-running batch jobs or started tasks, they
should be assigned a velocity goal. When setting a velocity goal for OLTP
regions, it is best to set a goal that will enable the regions to run sufficiently
so the transactions they serve will meet the service level objectives.
As stated above, the way the MVS Workload Manager manages transaction
and resource managers is dependant on whether their releases exploit the
new workload manager services provided in SP 5.1. Even when regions from
exploiting products start up, they get classified as batch jobs or started
tasks, and are assigned a service class. Once classified, like all other
address spaces, these regions are then managed by the MVS Workload Manager
to meet their specified velocity goal. During this time period they are
regarded by the Workload Manager as 'non-servers'.
However, once a region from an exploiting product starts processing
transactions, WLM will manage the region according to the goals of the
transactions it serves. During this time period the region is regarded
by the Workload Manager as a 'server'.
From this summary one can see that the goal that is assigned to a region
will only sometimes have an impact on the transaction response time achieved
by its workload. Regions that are not exploiting the MVS workload management
services must be given a sufficient goal to ensure that the transactions
they are processing achieve their response time objectives. Regions that
are exploiting the MVS workload management services must be given a sufficient
goal to ensure timely initialization during start-up, and proper treatment
when the region is not processing any transactions for an elongated period
of time.
It is unlikely an installation will always know which regions are exploiting
the WLM services. The installation probably does not want to change the
goals for work based on whether or not these service are being exploited.
Thus, it is best to assign a velocity goal that will enable the regions
to run sufficiently so that the transactions they serve will meet the service
level objectives.
Before even attempting to set a goal for OLTP transactions an installation
must first determine if the transaction and resource managers processing
these transactions exploit the workload management services. These services
allow SRM to be aware of transaction starts and completions, as well as
the response time being achieved. All of those are necessary to compare
to a goal, and to influence a decision to change resource allocation. Earlier
levels of the subsystems can of course be run, but that is accomplished
with a velocity goal for the address spaces, rather than a goal for the
interactive work running in the regions.
Start with just a few service classes for the OLTP work. For example, define
a service class for production CICS transactions and another for test CICS
transactions. Since CICS/ESA V4.1 does not alter its own internal dispatching
queue based on the goals, and SRM still controls the CICS regions at an
address space level, it does not make sense to spend a lot of time classifying
different CICS transactions to unique service classes with differing goals.
There is one implication of this simple approach that is not immediately
obvious. RMF breaks down the response time of CICS or IMS transactions
so the installation can see where those subsystems believe delays are occurring.
As an installation gathers very dissimilar work together into a single
service class, that response time breakdown data will not be meaningful.
Extending the simple rule stated above (separating test from production)
is straightforward. There may be a specific set of CICS transaction names,
or a specific collection of IMS transaction classes that are especially
critical to the installation. It may be appropriate to create a service
class just for them to ensure they receive the proper WLM goal.
Here is an example of a few classification rules for an installation's
CICS transactions.
Subsystem Type . : CICS
Description . . . Classification Rules for all CICS trans.
-------Qualifier------------- -------Class--------
Action Type Name Start Service Report
DEFAULTS: CICSTEST ________
____ 1 SI CIP*____ ___ CICSPROD ________
____ 2 TN AB*_____ ___ BANKING ________
____ 2 TN XYZ_____ ___ BANKING ________
In this example, the first classification rule to be checked for every
CICS transaction is the VTAM Application id of the receiving region (the
Subsystem Instance). Assuming the installation has a naming convention
of CIPxxxx for production regions, and CITxxx for test regions, the test
transactions will fail this rule and be assigned the default service class.
But when a production transaction arrives, the real service class to be
assigned will be determined after seeing if the transaction name either
begins with the characters AB, or is transaction XYZ.
A response time goal is the only kind that WLM supports for IMS or CICS
transactions. The goal could be either an average or a percentile response
time goal.
An average response time may be the only historical information available
when first choosing a goal. But since there are many examples of very long
running CICS transactions, (dynamic program link, conversational transactions,
etc.) it would probably behoove most installations to implement percentile
goals soon after activating goal mode.
Caution --- Some installations might use RPGNs today to see average
response times for CICS. Be very careful. If any CICS transactions are
routed, the current RMF data could very easily contain some double counting.
As such, the average response time is reported as less than actually seen
by the TOR. Installations who have NOT accounted for this today in the
IEAICSxx member should not rely on current RMF data for setting an average
CICS response time goal. Reference number 1 explains how to create a simple
WLM policy that will allow the compatibility mode RMF data to provide average
response times using the same definition of a transaction as goal mode
will use.
Installations may still want to track resources used by individual CICS
or IMS regions. Type 30 SMF records will continue to be available for that.
But if an installation created a special PGN for each region to use RMF's
type 72 SMF records, it can continue with that approach. Avoid creating
service classes for each of these regions. Instead, assign a report class
per region. All the resource usage data that has been accumulated for each
PGN will be provided by RMF for each service class and report class.
[ Table of Contents
| Previous | Next
]