Search

Effective Use of MVS Workload Manager Controls

[ Table of Contents | Previous | Next ]

5.0 Setting Goals for Interactive TSO

Deciding what goals to set for interactive TSO work will probably be the easiest of all the different work types. This is because most customers already have some sort of service level objectives set for this type of work as well as lots of historical data outlining what this type of work is already achieving in compatibility mode.

5.0.1 Interactive work should have a response time goal.

This is pretty obvious since 'interactive' implies there are real live users at the other end of the terminals waiting for a quick response when they hit enter. These transactions are short and quick, and enough are completing to allow SRM to collect a reasonable statistical sample set to base decisions on.

If a velocity goal is assigned to interactive TSO, it causes SRM to control the swap protect time on an individual address space basis rather than on a period-wide basis. That is less efficient. In addition, in cases when the TSO transactions do take a while to complete, SRM will look at the address spaces with a velocity goal to decide what expanded storage access to give to their demand paging, VIO paging, and hiperspace paging. This might involve monitoring address spaces for working set management control even though the work might still end quickly. Once again, this is less efficient than running interactive work with a response time goal.

5.0.2 It is probably best not to give interactive work a discretionary goal.

A discretionary goal implies the installation does not mind if SRM chooses to not run the work for a while. In such cases SRM may swap the discretionary address space out for long periods of time if needed. When a user is sitting at a terminal, this may be exactly what the installation intends if that user is exceeding the amount of service the installation feels is reasonable for a TSO user. This might be the case for the final 1% of all the TSO transactions. So it depends what percent of the TSO work makes it to the last service class period, and once there how the installation wants those TSO users treated.

5.0.3 A HOTTSO class may be helpful

Installations may choose to create a special TSO service class with a more aggressive goal and with an importance level higher than that of other TSO users. Any installation with a special system programmers' TSO performance group today could continue this approach with a special TSO service class.

6.0 Setting Goals for Batch

Deciding what goals to set for batch work will be very dependant on what type of batch work is being processed. Batch work has good uses for each of the three goal types: response time; velocity; and discretionary.

6.0.1 A Response Time Goal can be used for Batch

A response time goal is most suitable for short, homogeneous batch jobs. Today an installation may have one or more job classes to ensure fast turnaround time for this type of batch work. To meet the objectives for this batch work running in these job classes one would need to have enough initiators started to those classes, so that most jobs do not have long queue times waiting to be selected by an initiator. Using a response time goal, either average or percentile, to help ensure this fast turnaround time is very appropriate in this case whether the batch is test or production.

When setting a response time goal for batch work it is important to note that a batch job is one transaction. The transaction starts when the job is submitted (ie. when the JES reader processes the job), and completes when the initiator finishes executing the job. That means it does include the time queued by JES waiting for an initiator, but it does not include output processing. Therefore, the turnaround goal specified should include the sum of the queue time and execution time. Therefore a response time goal should be used for batch only when sufficient initiators exist for the associated job class such that most jobs do not experience lengthy queue time, and where there is a steady flow of completions. A batch response time goal would be less effective if (1) the goal is long, or (2) there is a low rate of job completions in the service class. An average of at least 10 batch jobs completions within a 20 minute period of time would constitute a steady flow of completions of jobs within a service class.

6.0.2 A Velocity Goal can be used for Batch

A velocity goal is appropriate for long-running production or test batch jobs, and for any IMS or CICS regions which an installation runs as batch jobs. The assumption for these recommendations is that there will be so few completions that a response time goal is inappropriate. Of course, the velocity goal specified for an IMS or CICS batch job only applies for the time when the region is not telling WLM about any transactions it is currently serving. It is therefore a good goal for CICS or IMS regions that are not yet upgraded to the release level supporting MVS Workload Manager.

A velocity goal could also be very appropriate for certain jobs or job classes which are held for a long time. When those jobs are released, the installation may require them to be processed quickly. Since the queue time might be unpredictable, a response time goal is inappropriate. But the velocity goal will tell SRM how to run the work once it has been released.

6.0.3 A Discretionary goal can be used for Batch

If an installation can not use a velocity goal for some batch work (or response time goal based on the caveats discussed earlier), that batch should be given a discretionary goal. Critical path production jobs should probably have a low velocity goal rather than discretionary, unless the site has plenty of capacity.

6.0.4 Multiple periods are still possible for batch jobs.

Some installations currently use more than one period in the performance groups for batch jobs, and some installations use just one period. There are advantages and disadvantages with each approach, with implications on the initiator structure. An installation currently using multiple period PGNs for batch can certainly continue that in goal mode.

Most likely in the IPS, the dispatching priority decreased in each period. In the WLM policy, the first period might contain a response time goal. Succeeding periods could have longer response time goals, possibly with decreased importance, with either a velocity or a discretionary goal for the last period.

6.0.5 A HOTBATCH class may be helpful.

Installations may choose to create a special service class with an aggressive goal for the case when some work "must run now". No classification rules need to refer to that service class, but operators could still use the RESET operator command to assign running work to that HOTBATCH service class.

Since the RESET causes SRM to start a new transaction for the address space, any JES queue time that may have existed is now ignored. This means a response time goal could be used for the HOTBATCH class. But if an installation has sufficient completions in the HOTBATCH service class, this probably warrants a discussion with operations! Therefore, a velocity goal is probably best for this service class.

6.0.6 A SWAPOUT class is not needed.

Many installations today create a special "swap out" PGN, that causes existing work to stop executing. There is no need to create a special service class like that, because the RESET operator command has been extended with a QUIESCE option. If a swappable address space is reset with that option, it will be swapped out. If a non-swappable address space is reset that way, it will remain in storage, but as the last item to be dispatched.

7.0 Guidelines for OLTP Workloads

A primary feature of the MVS Workload Manager is the ability to specify goals for on-line transactions, like CICS and IMS/TM user transactions. Prior to the MVS Workload Manager, customers influenced the response time for these transactions by controlling the resources provided to the transaction and resource managers. So if a performance manager wanted to achieve a better response time for a certain subset of the on-line transactions, the manager had to figure out which regions processed the transactions of interest. Then next task was to analyze the resource usage by those regions and work at improving this resource usage in hopes of achieving a better response time for the transactions of interest. This level of indirectness often made it difficult to tune on-line transaction processing workloads and often resulted in an inefficient use of system resources.

Examples of such address spaces are CICS regions, IMS control regions and message processing regions, and DB2 address spaces.

With the advent of the MVS Workload Manager, one can now specify response time goals for the on-line transactions. The MVS Workload Manager will manage the transaction and resource managers to work towards meeting the goals of the transactions they serve.

This feature is not without its caveats. The MVS Workload Manager is only able to manage these transaction and resource managers if they exploit specific MVS workload management services provided in MVS/ESA SP 5.1. As of the date of this paper, the following products are exploiting these services.

CICS/ESA V4.1 and higher
IMS/ESA TM V5.1 and higher
IMS/ESA DB V5.1 and higher
DB2 V4.1 and higher (for distributed DB2 transactions)

7.1 Setting Goals for Transaction and Resource Managers

7.1.1 Assign a velocity goal for OLTP regions.

Since the OLTP regions are long-running batch jobs or started tasks, they should be assigned a velocity goal. When setting a velocity goal for OLTP regions, it is best to set a goal that will enable the regions to run sufficiently so the transactions they serve will meet the service level objectives.

As stated above, the way the MVS Workload Manager manages transaction and resource managers is dependant on whether their releases exploit the new workload manager services provided in SP 5.1. Even when regions from exploiting products start up, they get classified as batch jobs or started tasks, and are assigned a service class. Once classified, like all other address spaces, these regions are then managed by the MVS Workload Manager to meet their specified velocity goal. During this time period they are regarded by the Workload Manager as 'non-servers'.

However, once a region from an exploiting product starts processing transactions, WLM will manage the region according to the goals of the transactions it serves. During this time period the region is regarded by the Workload Manager as a 'server'.

From this summary one can see that the goal that is assigned to a region will only sometimes have an impact on the transaction response time achieved by its workload. Regions that are not exploiting the MVS workload management services must be given a sufficient goal to ensure that the transactions they are processing achieve their response time objectives. Regions that are exploiting the MVS workload management services must be given a sufficient goal to ensure timely initialization during start-up, and proper treatment when the region is not processing any transactions for an elongated period of time.

It is unlikely an installation will always know which regions are exploiting the WLM services. The installation probably does not want to change the goals for work based on whether or not these service are being exploited. Thus, it is best to assign a velocity goal that will enable the regions to run sufficiently so that the transactions they serve will meet the service level objectives.

7.2 Setting Goals for On-Line Transaction Workloads

7.2.1 No goal can be set for OLTP transactions whose servers do not support the MVS WLM services.

Before even attempting to set a goal for OLTP transactions an installation must first determine if the transaction and resource managers processing these transactions exploit the workload management services. These services allow SRM to be aware of transaction starts and completions, as well as the response time being achieved. All of those are necessary to compare to a goal, and to influence a decision to change resource allocation. Earlier levels of the subsystems can of course be run, but that is accomplished with a velocity goal for the address spaces, rather than a goal for the interactive work running in the regions.

7.2.2 Start with simple classification rules

Start with just a few service classes for the OLTP work. For example, define a service class for production CICS transactions and another for test CICS transactions. Since CICS/ESA V4.1 does not alter its own internal dispatching queue based on the goals, and SRM still controls the CICS regions at an address space level, it does not make sense to spend a lot of time classifying different CICS transactions to unique service classes with differing goals.

There is one implication of this simple approach that is not immediately obvious. RMF breaks down the response time of CICS or IMS transactions so the installation can see where those subsystems believe delays are occurring. As an installation gathers very dissimilar work together into a single service class, that response time breakdown data will not be meaningful.

Extending the simple rule stated above (separating test from production) is straightforward. There may be a specific set of CICS transaction names, or a specific collection of IMS transaction classes that are especially critical to the installation. It may be appropriate to create a service class just for them to ensure they receive the proper WLM goal.

Here is an example of a few classification rules for an installation's CICS transactions.

Subsystem Type . : CICS
   Description  . . . Classification Rules for all CICS trans.

             -------Qualifier-------------            -------Class--------
   Action    Type       Name     Start                Service     Report
                                            DEFAULTS: CICSTEST    ________
    ____  1  SI         CIP*____ ___                  CICSPROD    ________
    ____  2    TN       AB*_____ ___                  BANKING     ________
    ____  2    TN       XYZ_____ ___                  BANKING     ________

In this example, the first classification rule to be checked for every CICS transaction is the VTAM Application id of the receiving region (the Subsystem Instance). Assuming the installation has a naming convention of CIPxxxx for production regions, and CITxxx for test regions, the test transactions will fail this rule and be assigned the default service class. But when a production transaction arrives, the real service class to be assigned will be determined after seeing if the transaction name either begins with the characters AB, or is transaction XYZ.

7.2.3 Use a response time goal for OLTP transactions.

A response time goal is the only kind that WLM supports for IMS or CICS transactions. The goal could be either an average or a percentile response time goal.

An average response time may be the only historical information available when first choosing a goal. But since there are many examples of very long running CICS transactions, (dynamic program link, conversational transactions, etc.) it would probably behoove most installations to implement percentile goals soon after activating goal mode.

Caution --- Some installations might use RPGNs today to see average response times for CICS. Be very careful. If any CICS transactions are routed, the current RMF data could very easily contain some double counting. As such, the average response time is reported as less than actually seen by the TOR. Installations who have NOT accounted for this today in the IEAICSxx member should not rely on current RMF data for setting an average CICS response time goal. Reference number 1 explains how to create a simple WLM policy that will allow the compatibility mode RMF data to provide average response times using the same definition of a transaction as goal mode will use.

7.2.4 Use report classes for OLTP regions, if needed

Installations may still want to track resources used by individual CICS or IMS regions. Type 30 SMF records will continue to be available for that. But if an installation created a special PGN for each region to use RMF's type 72 SMF records, it can continue with that approach. Avoid creating service classes for each of these regions. Instead, assign a report class per region. All the resource usage data that has been accumulated for each PGN will be provided by RMF for each service class and report class.

[ Table of Contents | Previous | Next ]