Mainframes 360
The one stop destination for System Z professionals

Wednesday, March 26, 2014

COBOL Tables

COBOL Tables

A large manufacturing company wants to produce a sales report. You are asked to write a COBOL report-generation program to print the sales in every month. You'd have to define repetitious variables JANUARY-SALES, FEBRUARY-SALES, .., DECEMBER-SALES. This is very cumbersome.
       01  WS-SALES-DATA.
  05 JANUARY-SALES PIC 9(07)V99.
  05 FEBRUARY-SALES PIC 9(07)V99.
  05 MARCH-SALES PIC 9(07)V99.
  05 APRIL-SALES PIC 9(07)V99.
  05 MAY-SALES PIC 9(07)V99.
  05 JUNE-SALES PIC 9(07)V99.
  05 JULY-SALES PIC 9(07)V99.
  05 AUGUST-SALES PIC 9(07)V99.
  05 SEPTEMBER-SALES PIC 9(07)V99.
  05 OCTOBER-SALES PIC 9(07)V99.
  05 NOVEMBER-SALES PIC 9(07)V99.
  05 DECEMBER-SALES PIC 9(07)V99.
When there are multiple items of the same type, you can use arrays.

A COBOL table or array is simply a data structure consisting of a collection of elements(values), all of which have the same data description, such as a table of monthly sales. Think of a table as a matrix, a grid of elements.

Let's establish an array of monthly sales amounts for the company. The array consists of 12 sales amounts. The OCCURS clause in COBOL defines the size of the array. Note that, the OCCURS clause may not be used on the levels 01, 66 and 77. To define the array in WORKING-STORAGE with an OCCURS clause would require the following coding.
       01  WS-SALES-TABLE.
  05 WS-MONTHLY-SALES PIC 9(07)V99 OCCURS 12 TIMES.

The WS-SALES-TABLE is one group item that contains the table. WS-MONTHLY-SALES names the elements of the table. OCCURS 12 TIMES defines twelve occurrences of WS-MONTHLY-SALES. You'd refer to any element by its position in the array. For example, the sales in June i.e. the sixth element is referred to as WS-MONTHLY-SALES(06).


The elements of a table could be elementary or group data items. For example, you could create a table of employee SSN and their names. In this example, WS-EMPLOYEES-TABLE is a group item containing the table. WS-EMPLOYEE names the elements of the table, so the individual employees would be referred as WS-EMPLOYEE(01), WS-EMPLOYEE(02), WS-EMPLOYEE(03) and so forth. Similarly, the child elements inside WS-EMPLOYEE(03) would be called WS-EMPLOYEE-SSN(03) and WS-EMPLOYEE-NAME(03).
       01  WS-EMPLOYEES-TABLE.
  05 WS-EMPLOYEE OCCURS 05 TIMES.
  10 WS-EMPLOYEE-SSN PIC X(09).
  10 WS-EMPLOYEE-NAME PIC X(30).


Referring to an item in the table
A table name is a collective name for all the elements in it. WS-SALES-TABLE, WS-EMPLOYEES-TABLE are examples of table names. However, to identify the items of a table, you can use two techniques viz., subscripts and indices.

Subscript is the absolute position of the element in the table. As an example, let's assume that a bank's customer accounts information is stored in a table. Every customer account has a cash account no., balance and customer name. CUSTOMER-ACCOUNTS-TABLE is defined as follows.
       01  CUSTOMER-ACCOUNTS-TABLE.
  05 CUSTOMER-ACCOUNT OCCURS 10 TIMES.
  10 CASH-ACCOUNT-NO PIC X(10).
  10 BALANCE PIC 9(07)V99.
  10 CUSTOMER-NAME PIC X(30).
 
  01 WS-ACCOUNT-SUB PIC 9(04) VALUE 1.
You refer to the 3rd element of the CUSTOMER-ACCOUNTS-TABLE, by the element name CUSTOMER-ACCOUNT, followed by the position or subscript 03 in braces() as CUSTOMER-ACCOUNT(03). The 5th account would be CUSTOMER-ACCOUNT(05). In general, the ith element of this table would be CUSTOMER-ACCOUNT(i). This mechanism applies to any children of the CUSTOMER-ACCOUNT item as well. The code snippet below shows this.
           COMPUTE INTEREST = (BALANCE(03) * NUMBER-OF-YEARS * RATE)/100
  DISPLAY CUSTOMER-NAME(03)
  DISPLAY CASH-ACCOUNT-NO(05)
  DISPLAY CUSTOMER-ACCOUNT(03)
Note that when you refer to CUSTOMER-ACCOUNT(03), it behaves as one whole alphanumeric item.

You can also define a subscript variable like WS-ACCOUNT-SUB and initialize it at run-time, rather than hard-coding the subscript's value. For example, if WS-ACCOUNT-SUB is initialized to 03, CUSTOMER-NAME(WS-ACCOUNT-SUB) refers to the third customer's name. If WS-ACCOUNT-SUB=05, BALANCE(WS-ACCOUNT-SUB) is the 5th account's balance.
           MOVE 03 TO WS-ACCOUNT-SUB
  DISPLAY CUSTOMER-NAME(WS-ACCOUNT-SUB)
  MOVE 05 TO WS-ACCOUNT-SUB
  ADD 100 TO BALANCE(WS-ACCOUNT-SUB)
A diagrammatic representation of the CUSTOMER-ACCOUNTS-TABLE is shown. Array elements are laid out in computer memory in adjacent locations. Because arrays are contiguous in memory, data processing is faster. At run-time, subscripts are converted to a displacement from the start of the table. For example, each CUSTOMER-ACCOUNT element holds 10 + 09 + 30 = 49 bytes. An array of 10 items may be stored at displacements 0, 49, 98, 147,...,490, and an element with subscript i is at an offset 49 x i.


Index is the displacement of an element from the start of the table. Indexes perform better than subscripts, as indexes already store the offset and does not have to be calculated at run-time. Indexes are defined by the INDEXED BY statement on the OCCURS clause.
       01  JOB-ID-TABLE.
  05 JOB-ID PIC X(05) OCCURS 10 TIMES
  INDEXED BY IDX-A

The compiler automatically calculates the value contained in the index as the occurrence number minus 1 multiplied by the size of the table element. Therefore, for the sixth occurrence of JOB-ID, the binary value contained in IDX-A is (6 - 1) x 5 = 25. Consequently, IDX-A variable can be used to address only JOB-ID-TABLE; you can't address any other table using IDX-A except if it has the same number and type of elements.

The SET statement can be used to manipulate an index. While the index is actually a byte offset, COBOL does not expect you to work at that level. Setting an index to 2, will cause the compiler to correctly calculate the byte offset and point to the 2nd element in the table. Likewise, setting an index up by one will move the index to the next element, regardless of the size of the element. The COBOL compiler translates it into a proper offset for you.
           SET INDX-A TO WS-NUMBER
  SET INDX-A TO 2
  SET INDX-A UP BY 1
  SET INDX-A DOWN BY 2

Table processing
Table processing often involves traversing the table and accessing array elements. Suppose, we are interested to find the average sales in the manufacturing organization example. Here's the algorithm. Initialize the index I to 1. Access A[I], followed by incrementing the index to I+1 iteratively, until no more items are left.
       7000-COMPUTE-AVG.
  MOVE ZEROES TO AVG-SALES
  TOTAL
  SET IDX-A TO 1
 
  PERFORM VARYING IDX-A FROM 1 BY 1
  UNTIL IDX-A > 12
  COMPUTE TOTAL = TOTAL + WS-MONTHLY-SALES(IDX-A)
  END-PERFORM
 
  COMPUTE AVG-SALES = (TOTAL/12)
  DISPLAY 'AVERAGE SALES IS ', AVG-SALES

Pre-filled Tables
Tables can be pre-filled with information. Such pre-calculated tables come in very handy. Lookup tables save processing time by searching for an input x and finding the output y from the table, rather than do an expensive computation. Let us take an example. An online CICS program require date conversion from MM-DD-CCCYY to DD-MON-CCYY format, before display on the screen. We must transform 01, 02, 03,...,12 months to JAN,FEB,MAR,...,DEC. In order to achieve this, I create a one large WORKING STORAGE variable that maps month numeral to words. I then construct an array out of this variable by slicing it into twelve parts.
       01  WS-MONTHS-DATA.
  05 FILLER PIC X(05) VALUE '01JAN'.
  05 FILLER PIC X(05) VALUE '02FEB'.
  05 FILLER PIC X(05) VALUE '03MAR'.
  05 FILLER PIC X(05) VALUE '04APR'.
  05 FILLER PIC X(05) VALUE '05MAY'.
  05 FILLER PIC X(05) VALUE '06JUN'.
  05 FILLER PIC X(05) VALUE '07JUL'.
  05 FILLER PIC X(05) VALUE '08AUG'.
  05 FILLER PIC X(05) VALUE '09SEP'.
  05 FILLER PIC X(05) VALUE '10OCT'.
  05 FILLER PIC X(05) VALUE '11NOV'.
  05 FILLER PIC X(05) VALUE '12DEC'.
 
  01 WS-MONTH-MAP REDEFINES WS-MONTHS-DATA.
  05 WS-MONTH-ITEM OCCURS 12 TIMES.
  10 WS-MONTH-NUM PIC 9(02).
  10 WS-MONTH-NAME PIC X(03).
 
  01 WS-FROM-DATE-MM-DD-CCYYY.
  05 WS-FROM-DATE-MM PIC X(02).
  05 FILLER PIC X VALUE '-'.
  05 WS-FROM-DATE-DD PIC X(02).
  05 FILLER PIC X VALUE '-'.
  05 WS-FROM-DATE-CC PIC 9(02).
  05 WS-FROM-DATE-YY PIC 9(02).
 
  01 WS-TO-DATE-DD-MON-CCYY.
  05 WS-TO-DATE-DD PIC X(02).
  05 FILLER PIC X VALUE '-'.
  05 WS-TO-DATE-MON PIC X(03).
  05 FILLER PIC X VALUE '-'.
  05 WS-TO-DATE-CC PIC 9(02).
  05 WS-TO-DATE-YY PIC 9(02).
There are 12 occurrences of WS-MONTH-ITEM. A date conversion routine can input WS-MONTH-NUM to look-up the corresponding WS-MONTH-NAME in the table.
       8500-CONVERT-DATE-TO-DISP-FMT.
  MOVE WS-FROM-DATE-DD TO WS-TO-DATE-DD
  MOVE WS-FROM-DATE-CC TO WS-TO-DATE-CC
  MOVE WS-FROM-DATE-YY TO WS-TO-DATE-YY
  MOVE WS-FROM-DATE-MM TO WS-MONTH-NUM
  MOVE WS-MONTH-NAME(WS-MONTH-NUM) TO WS-TO-DATE-MON.
  8500-EXIT. EXIT.

A central store for static data
A well-designed program has the static data centralized at one place. I frequently store static data, constants like rates, prices, categories, options in COBOL arrays in WORKING STORAGE, rather than hard-coding at many places. Let's establish an example to compute the tax of a US citizen. Tax in the United States is computed as follows.
Tax BracketTax Rate
$0 – $8,92510%
$8,926 – $36,25015%
$36,251 – $87,85025%
$87,851 – $183,25028%
$183,251 – $398,35033%
$398,351 – $400,00035%
$400,001+39.6%
Suppose, a single tax-payer has $350,000 income. Tax amount of the first bracket is $8,925 x 10% = $892.50. Tax amount of the second bracket is ($36,250 - $8,925) x 15% = $4098. Tax amount of the third bracket is (87,850 - 36,250) x 25% = $12,900. Likewise, tax amount of the fourth and fifth brackets are $26,712 and $55027.50. With that background, let's establish a TAX-RATES-TABLE that includes static data such as tax rates, the floor and ceiling values for each tax bracket.
       01  TAX-RATE-DATA.
  05 FILLER PIC X(23) VALUE '00000000000089250001000'.
  05 FILLER PIC X(23) VALUE '00089260000362500001500'.
  05 FILLER PIC X(23) VALUE '00362510000878500002500'.
  05 FILLER PIC X(23) VALUE '00878510001832500002800'.
  05 FILLER PIC X(23) VALUE '01832510003983500003300'.
  05 FILLER PIC X(23) VALUE '03983510004000000003500'.
  05 FILLER PIC X(23) VALUE '04000010099999999903960'.
 
  01 TAX-RATES-TABLE REDEFINES TAX-RATE-DATA.
  05 TAX-RATE OCCURS 07 TIMES INDEXED BY IDX-A.
  10 TAX-SLAB-FLOOR PIC S9(07)V9(2).
  10 TAX-SLAB-CEIL PIC S9(07)V9(2).
  10 TAX-SLAB-RATE PIC S9(03)V9(2).

A generic sub-routine can be written to compute tax as follows. Its easy to see, that if the tax rates are revised, only the TAX-RATES-TABLE needs modification, the 7000-COMPUTE-TAX sub-routine remains as is.
       7000-COMPUTE-TAX.
  MOVE ZEROES TO TOTAL-TAX
  MARGINAL-TAX
  MOVE 'N' TO STOP-PROCESS-SW
 
  PERFORM VARYING TAX-RATE-IDX FROM 1 BY 1
  UNTIL IDX-A > 07 OR STOP-PROCESSING
  IF INCOME > TAX-SLAB-CEIL(IDX-A)
  COMPUTE MARGINAL-TAX =
  ((TAX-SLAB-CEIL(IDX-A)
  - TAX-SLAB-FLOOR(IDX-A)) * TAX-SLAB-RATE(IDX-A))/100
  ELSE
  COMPUTE MARGINAL-TAX =
  ((INCOME
  - TAX-SLAB-FLOOR(IDX-A)) * TAX-SLAB-RATE(IDX-A))/100
  SET STOP-PROCESSING TO TRUE
  END-IF
  COMPUTE TOTAL-TAX = TOTAL-TAX + MARGINAL-TAX
  END-PERFORM.
  7000-EXIT. EXIT.

Sunday, March 16, 2014

EXEC Statement

How to code the EXEC statement

A job-stream may have a series of steps to pre-process,the core processing followed cleanup or house-keeping steps. Each step is logically defined in JCL by an EXEC statement, you'd have as many EXEC statements as the number of steps. On each step, the EXEC statement identifies the program to be executed by that step.

A program typically reads data from one or more input files, includes business logic to process the data and writes the output records to one or more output files. DD statements let you specify input files, output files and intermediate work files for a program. Stringing it up together, a job-stream would have a skeleton like this.

The PGM parameter

The PGM parameter on the EXEC statement lets you specify the program to be executed. Every program is a member of a partitioned dataset. The member must be a load module(compiled code). You may execute user programs written by a developer. You may run the numerous IBM supplied utility programs - IEBGENER, IEBCOPY, IEBDG, IEWL, SORT.

The PARM parameter

The PARM parameter can be used to pass information to a program. This information is usually used to influence, control the way the program works. Many IBM supplied utilities like compilers, linkers, SORT use the PARM information for various processing options.

Here's an EXEC statement that passes three parameters to the program IGYCRCTL.

Specifying the job's execution time limit

The TIME parameter lets you specify a default processing time limit on your job. Processing time refers to the CPU time and not the wall clock time. You can specify the TIME parameter on the JOB and EXEC statements, with the exception of TIME=0 which can be used only on an EXEC statement.

You normally specify the value in minutes and seconds. TIME=(2,30) implies 2 minutes, 30 seconds. TIME=(,30) has no minutes value, a leading comma and only the seconds part and hence it means 30 seconds.

If you specify a TIME limit of 1440 minutes or code TIME=NOLIMIT keyword, no time limit is applied at all. Your job executes indefinitely. On the other hand, TIME=MAXIMUM imposes a maximum upper bound of 357,912 minutes. If you specify TIME=0 on an EXEC statement, the job step can use the processing time remaining from the previous step. If it exceeds the time available, the step will fail.

Coding TIME parameter on the JOB statement specifies a time limit on the whole job. As each step executes, its execution is added up to the job's total processing time. If the job's time limit is exceeded, the job is cancelled. On the other hand, when you specify TIME parameter on the EXEC statement, it applies to only to that job step.

There is a potential conflict when the TIME parameter on the EXEC statement specifies a greater time limit than on the JOB statement. In that case, the job's time limit takes precedence over the step's time limit.

Dividing central storage into Regions

Like a street address, a storage address(a hexadecimal number) identifies a storage location. A one byte address can refer 28 = 256 locations. A three bytes(24 bits) address can refer 224 = 16 million locations. The amount of storage a program can refer is called its address space.

Before the introduction of System/360 in 1964, computers ran one job at a time. Business and scientific applications used different sized computers having different instruction sets and operating systems. System/360 gave all IBM mainframes the same hardware architecture, so applications could run regardless of the computer model. The System/360 ancestor of today's operating system, MVT(Multiprogramming with Variable tasks) could run 15 jobs concurrently in real storage. Programs lived in variable-sized areas called regions. The system used 24 bits for addressing. Thus, programs could address 16 MB of memory.

Fig. Real storage in MVT

Under MVT, each job was scheduled based on what resources it needed. If a job needed a tape, a JCL statement would describe the tape unit. Until the tape unit was mounted, the OS did not schedule the job. This prevented jobs from sitting idle in real storage. A running job occupied a contiguous region of real storage. Users specified the amount of storage required through REGION size in JCL statements. On completion, the storage is freed. The first region contains the OS modules or the nucleus and always reside in real storage.

By the 1970's, the amount of central storage had become a critical bottleneck. Application programs grew in size. The OS evolved, gained many functions and grew to 8 MB in size. The operating system modules also required a portion of the application program's address space.

The quest for central storage

Fig. Virtual storage

In 1972, IBM introduced the 370 family of computers with a new architectural component - virtual storage. Virtual storage is an illusion, where a program thinks, it has an unlimited amount of memory to itself. The application program is independent of the addresses of central storage. Although limited to 16 MB in size, programs were stored on disk(auxiliary storage) and portions of the program were brought into central storage as and when needed. With virtual storage, a program need occupy only a relatively small part of the central storage. Programs were now also sharing data. The shared data out of a program's address space alongside OS code was placed in a common area. By the end of 1970s, another bottleneck appeared. The 16 MB virtual storage limit was insufficient for programs.

The quest for address space

Fig. MVS/XA storage

In 1983, IBM introduced S/370-XA(Extended Address) architecture. With it, addresses could be 31-bits long;the highest accessible location was 232 - 1 = 2,147,483,648(2GB). The 16M boundary between the two architectures came to be known as the line. Older programs could be marked to run in the 16 MB address space. New programs could use the entire 2 GB address space. Programs that exceeded the 16 MB address space were termed above the line. This could occur in two ways. First, a very large program could require more than 16 MB. Second, in CICS applications, since all programs run under the same address space, even a small program could be forced to run above the 16 MB line. Users had to compile and link-edit their programs with AMODE(31) option for running it above the line.

Fig. z/OS addressability

In recent years, IBM's zArchitecture defines 64-bit storage addresses. A 64-bit address can refer 18,446,744,073,709,551,616 locations. A program on the z/OS can therefore run in 24- , 31- or 64-bit addressing.

Specifying the job's storage limits

The REGION parameter is used to specify the maximum amount of real or virtual storage. It does not acquire memory, but merely specifies an upper bound. You can code the REGION parameter on the JOB or EXEC statement. If you specify it on a JOB statement, the region size applies to all the job steps within the job and overrides any step limits. You can specify the region size in terms of Kilobytes or Megabytes.

Internally, the z/OS maintains a 24-bit(below the line) maximum and a 31-bit(above the line) maximum. The REGION parameter is dual in nature and influences the 24 and 31 bit storage limits, depending on the range of values.

REGION Limits
REGION=0K or REGION=0M The value of zero is a special case. Coding a zero REGION size sets the limits to all of the 24-bit and 31-bit limit available.
REGION=1K to REGION=16M When you code a value between 1K through 16M, the limit will be applied only to 24-bit storage. There's also an impossible range of REGION values between 8192K to 16384K. If your job uses REGION values in the impossible range, you would get an S822 abend. The 31-bit storage limit is set to the IBM default 32 MB.
REGION=16M to REGION=32M The 24-bit storage limit is set to the site defined maximum. The 31-bit storage limit defaults to 32 MB.
REGION=32M to REGION=2047M When you code a value between 32M through 2047, the limit will be applied only to 31-bit storage. The 24-bit storage limit is set to the site defined maximum.

I used Mark Zelden's REXXSTOR utility to determine these storage limits.

Tuesday, March 11, 2014

How to code the JOB Statement

How a job is entered into the system

In the early days of System 360-370, JCL statements, programs, data were stored on punched cards and tape media. Entering a job into the system meant placing the deck of cards on a hopper and pressing the start button, to start feeding them through a card reader. The card reader received a stream(series) of electronic signals, representing the holes in the punched card. Today, a programmer enters JCL code and data on a terminal and the resulting job stream is saved into a file on DASD. However, the job hasn't entered into the system yet, as JES2 doesn't know about it.

When the programmer issues a SUBMIT command, JES aka Job Entry Subsystem reads the input JCL stream and places it into a staging area - the input queue or JES spool. Its like a large waiting room at an airport or a bus station from where jobs leave en masse. JES then invokes an interpreter. The interpreter's job is to analyze the JCL statements and convert them into an internal representation - build a series of control blocks in the Scheduler Work Area(SWA). If there are syntax errors, the faulty JCL is flushed to the output queue. The SWA control blocks amongst other things describe all the datasets a JCL needs.

How a job is scheduled for execution

A programmer codes the job CLASS in JCL code. Jobs with similar characteristics should have the same CLASS. Typically, every installation sets up categories or job classes, for instance CLASS=A for small jobs, CLASS=G for long-running jobs, CLASS=T for tape jobs.

A special type of a program called an initiator picks up jobs for execution from JES spool. The initiator is like a nanny that selects a job from the JES spool, executes the job in its address space and returns to the JES spool for another job. Every initiator has a class list. In a hypothetical system, say there are three initiators - INIT1 A, INIT2 B,C,D, INIT3 B,C. An initiator selects jobs from only those classes which are on its class list. Every initiator can run only one program at a time. In the example, only one CLASS=A can run at a time. Two CLASS-B jobs can run at a time. The installation can control the number of initiators(JES managed).

The initiator goes through three phrases - allocation, execution and unallocation. First, it invokes allocation routines that analyze the SWA control blocks to see what resources(volumes, datasets) the job step needs and they're allocated. Next, the initiator creates a user region where the user-program can execute and loads the program into the region and transfers control to it. When the program completes, the initiator invokes the unallocation routines to release any resources.

As the program executes, it can produce output data that needs to be held in JES spool and printed later. This is called SYSOUT dataset. Three other SYSOUT datasets are produced - JES output messages, a JCL listing and a system message log that lists any messages issued by zOS as the job executed. These SYSOUT datasets are brought to the output queue in the JES spool.

Like jobs, SYSOUT datasets are each assigned an output class. Output classes usually indicate the printer or printers to which the output is sent. Sometimes, an output class MSGCLASS=H may however specify that output is not to be printed. Instead, it is held in the output queue, until it is purged.

Fig. How a job is executed on z/OS

How to code the JOB statement

The JOB statement has three basic functions. First, it identifies the job to z/OS by supplying a job name. Second, it supplies accounting information and programmer details, so that zOS knows who is responsible for the job and if necessary, must be charged for the computer resources used. Third, it supplies various limits that influence the job execution.

Of all the JCL statements, the JOB statement is the one whose format varies the most from one installation to another. You then tailor the JOB statement accordingly. The basic syntax of the JOB statement is : The JOB statement must always be the first statement in the JCL. Comments before a JOB statement too are invalid!

The job name

The name or label coded on the JOB statement from columns three to ten is the job name. Like, IBMUSERA is the job name. The zOS uses the job name to identify your job. If two or more active jobs have the same name, the job identifier(the job-id or job number) is used to distinguish them. The system assigns a unique job id to every job.

When submitting a job through TSO/ISPF, it is good to have a job-name that has the TSO userid suffixed by one or two alphanumeric characters. The examples here assume a TSO userid IBMUSER. Then, I'd code a job name like IBMUSERA or IBMUSER1. There are a couple of rules to keep in mind. Job names are strings of length 1-8 and can have alphanumeric(A-Z,0-9) characters and national symbols(@,#,$). The first character must be an alphabet or a national symbol. Some valid job names are :

The accounting information parameter

In the early days of the mainframe, programmers were billed(charged) for computer-time that they used on a mainframe. Accounting information supplies the account number to which to bill the CPU time utilized. This parameter varies from installation to installation. In the example, A123 is the account number to which the job's processing time will be billed. In the example, SHELDON COOPER is the additional accounting information. The leading comma indicates the absence of the account number. This example has both the account number and the additional account details. You can submit this JCL on the z/OS and see the output.

The labelling or routing information

The zOS prints several messages to the job log during execution. When the job output is printed, each of the "separator-pages" will be labelled with the user-specified value. This will allow the personnel to physically separate your output from the output of other jobs at the printer. At one shop, they had 100 "bins" for the print output. The label 'BIN-7 QUASAR' informs zOS print 'BIN-7 QUASAR' on each of the separator pages. The personnel operating the printer would put my output in bin# 7.

The CLASS parameter

The CLASS parameter is used to categorize the job. At a mainframe shop, the system programmers setup these categories. For instance, you might have CLASS=A for small jobs, CLASS=G for medium jobs, and CLASS=R for long-running jobs. This helps z/OS pick up jobs for execution. The below JCL shows how the CLASS parameter can be coded.

The MSGCLASS parameter

As your program runs, it can produce SYSOUT datasets. Other SYSOUT datasets containing z/OS system messages, JCL listing and JES messages are produced. The MSGCLASS parameter lets you specify an output class for your SYSOUT datasets. An installation may set up various output classes. For instance at my shop, MSGCLASS=X means that the output will be printed on a high-speed printer, MSGCLASS=H means that the output will be held in the JES spool.

The MSGLEVEL parameter

The MSGLEVEL=(stmt,msg) parameter lets you specify the what kind of JCL statements you want included in the output and which messages appear in the message log. stmt could be 0: print only the JOB statement, 1: print all the JCL statements and expanded procedure statements too or 2: print only JCL statements in the input stream, don't print statements in a PROC. msg could be 0: print only the step completion messages, supress the allocation and unallocation messages unless the job fails. A value 1 implies print all messages. If you omit the MSGLEVEL parameter, it defaults to MSGLEVEL=(1,1) that causes all JCL statements and messages to be included in the output.

The NOTIFY parameter

If you code the NOTIFY parameter on the JOB statement, you'll automatically notified when the job completes. NOTIFY parameter lets you specify a TSO user-id or the &SYSUID system symbol to be notified. If you code your user-id, you'll be notified whether or not you're submitting the job. If you code the &SYSUID system symbol, it is automatically replaced by the user-id of the submitter.

Friday, March 7, 2014

DB2 Basics : Indexes

DB2 Basics : Indexes

An index is a data-structure that exists physically on the disk. The index is usually defined on one or more columns of a table. It has the ordered column and pointers to rows in the table. Indexes are the fastest way to access DB2 data. Indexes reduce search-time.

Fig. Searching a value k in the Index

An index is essentially a B+ Tree. It has a root page, internal nodes and leaf pages. Each node has records containing two fields - the key and pointer(ROWID) to other nodes. The leaf nodes have no children; instead they house keys and pointers to data pages. B+ Trees are used in most filesystems and relational database products like IBM DB2, MS-SQL Server, Oracle, Sybase etc. to create indexes on data for efficient retrieval.

Let's say we are searching for a value k in the B+ Tree. Starting at the root page, we'll walk our way through the index and reach a leaf containing the value k(assume k=28). At each node, we determine which internal pointer we should follow. An internal B+ Tree node has at most b(in the above example b=3) children.

Imagine every node represent a range e.g. the root node represents 1<=key<=100, the blue node stands for 1<=key<=20, the yellow 21<=key<=40 and so forth. We compare k with the root node records. If k < 20, search the blue node. If 21<=k<=40, search the yellow node. If 41<=k<=60, search the green node. We follow the pointer to the yellow-node and the above steps applied recursively. Searching a tree with 3 leaf nodes takes 1 comparision, with 9 leaf nodes, takes 2 comparisions, with 27 leaf nodes takes 3 comparisions, with n leaf nodes requires log3 n comparisions. An index with b branches would require logb n comparisions. Searching a value in index requires logarithmic time.

Inserts and Page Splits

On index creation, LOADs and REORGs, DB2 reserves empty space for future INSERTs on every leaf page. PCTFREE, a percentage of the page-size reserves space for row overflows on all leaf pages. Say, PCTFREE=10 of a 4K page books 400 bytes on every leaf. As time goes by, INSERTs add entries whereas DELETE pseudo-deletes the index entries and leaves holes. A few index holes are good, into which newly inserted rows can go. When an INSERT happens, DB2 finds the candidate page affected. Based on the newly INSERTed row's key, an index hole may be reused, otherwise its added to the overflow area. If there isn't any room for an additional row to fit, DB2 will split the page into two equal groups. We call this a page split.

Clustering sequence and CLUSTERRATIO

A clustered index informs DB2 of the physical order(according to index key values) in which to arrange the rows of a table. On REORG, DB2 would sort the rows in an increasing order. Think about it! If you define a clustered index on DEPT_NO column of EMPLOYEES table, DB2 would have all the rows for the same department close together. Fewer reads would be required to access all rows for a given department.

Indexing is an art that takes time to master. Here are a few guidelines on creating an index :
  • Consider indexing on columns used in WHERE, GROUP BY, ORDER BY UNION and DISTINCT.
  • When you index a table, explicitly specify CLUSTER option on CREATE INDEX. Failure to do so, causes DB2 to cluster the data on the first index defined.
  • The first column of a multi-column index must be wisely chosen. I am using this analogy, where frequency can be thought of as temperature. Rarely accessed rows are called cold rows. Say, a CUSTOMERS_HISTORY database is not date ordered(UPD_TS is not indexed), and old rows are not deleted. To begin with, the hot rows can be spread out over two to three pages. Over time, the average page would contain fewer and fewer hot rows and more cold rows. As application programs are requesting the hot rows, these set of rows are now spread out over ten to twelve pages, instead of two to three. Voila! The number of GETPAGEs go up and so does the application CPU time. Therefore, choose the most referenced column in SQL as the first index column.

CLUSTERRATIO is the fraction of data rows in clustering sequence. If you do a RUNSTATS and see a CLUSTERRATIO of less than 96%, it implies there are performance problems and you should REORG.

INSYNC EDIT SYSIBM.SYSINDEXES                            ROW 1 OF 6 COLS 1 - 72 
COMMAND ===> SCROLL ===> CSR
NAME TBNAME CLUSTERRATIO FIRSTKEYCARD FULLKEYCARD NLEAF
CHAR(8) CHAR(10) SMALLINT INTEGER INTEGER INTEGER
-------- ---------- ------------ ------------ ----------- ---------
****** ************************** TOP OF DATA *********************************

000001 IDXEMP01 EMPLOYEES 100 171 171 2
000002 IDXEMP02 EMPLOYEES 100 92 167 11
000003 IDXEMP03 EMPLOYEES 100 25 171 8
000004 IDXEMP04 EMPLOYEES 100 171 171 2
000005 IDXEMP05 EMPLOYEES 100 4 167 14
000006 IDXEMP06 EMPLOYEES 100 25 31 5
****** ************************* BOTTOM OF DATA *******************************









F1=HELP F2=SPLIT F3=END F4=RETURN F5=RFIND F6=RCHANGE F7=UP F8=DOWN F9=SWAP
F10=LEFT F11=RIGHT F12=RETRIEVE

Fig. CLUSTERRATIO in SYSIBM.SYSINDEXES

Wednesday, March 5, 2014

Cobol – An Introduction

COBOL is a language for business applications

COBOL (COmmon Business Oriented Language) is a high-level programming language suited to develop business applications. COBOL programs are employed in business processes likePayroll, Accounting, Inventory Management, Billing, Reservation Systems etc. Applications built in COBOL are in wide-spread use across the globe.

The way COBOL works

Computers really understand only one language : machine code, a binary stream of 0s and 1s. You must convert your COBOL code into machine code with the aid of a compiler.

Anatomy of a COBOL program

When z/OS runs your COBOL program, it looks for a specially coded DIVISION that looks exactly like:
  PROCEDURE DIVISION.
your code goes here
STOP RUN.

The zOS runs everything in the PROCEDURE DIVISION of your COBOL program and stops execution at STOP RUN. In COBOL, you put instructions or code inside the PROCEDURE DIVISION, followed by STOP RUN.


Code Structure in COBOL

COBOL programs have an IDENTIFICATION DIVISION, ENVIRONMENT DIVISION, DATA DIVISION and a PROCEDURE DIVISION. DIVISION's can have SECTION's and SECTION's have paragraphs. A paragraph is a block of code that has COBOL statements, we'd like to call them sentences.

You put declarative statements in the IDENTIFICATION DIVISION, ENVIRONMENT DIVISION and DATA DIVISION. You write instructions or executable statements in the PROCEDURE DIVISION. PROCEDURE DIVISION is where the ball starts rolling.

The first part of the COBOL program is the IDENTIFICATION DIVISION. This division has information that helps identify the program-name, author, date it was written etc. The ENVIRONMENT DIVISION has information on the platform/environment, the program would run. The DATA DIVISION declares variables and data structures. The PROCEDURE DIVISION contains executable code.
       IDENTIFICATION DIVISION.
  PROGRAM-ID. MYPROG01.
 
  ENVIRONMENT DIVISION.
 
  DATA DIVISION.
 
  PROCEDURE DIVISION.
  0000-MAIN.
  DISPLAY 'I RULE!'
  STOP RUN.

Look at the lean program I've written above. The IDENTIFICATION DIVISION is the first division in every COBOL program. PROGRAM-ID is a paragraph in the IDENTIFICATION DIVISION. MYPROG01 is a statement inside PROGRAM-ID. The z/OS will identify this program as MYPROG01. Next, the ENVIRONMENT DIVISION must describe the system or platform for the program. The DATA DIVISION declares data-items. Don't worry about these right now, this is just to get you started!

I've defined a paragraph 0000-MAIN in PROCEDURE DIVISION. The DISPLAY statement in COBOL displays text strings on the terminal. 'I RULE!' is displayed on the screen. STOP RUN indicates the end of the COBOL program and execution stops.

What can you say in the PROCEDURE DIVISION?

Once you're inside the PROCEDURE DIVISION, the fun begins! Like in most programming languages, you can do normal things like assignments, arithmetic. You can do something under a condition - IF/ELSE tests. You can do something again and again - Loop or iterate.


COBOL is verbose and has English-like expressions to depict logic(Its inventors envisioned that perhaps COBOL could be read by programmers and managers alike). MOVE is used for assignments : MOVE 'JOHN RAMBO' TO FULL-NAME, MOVE 10000 TO PRINCIPAL-SUM, MOVE TOTAL-COST TO RPT-FIELD-1. COMPUTE helps with writing and evaluating arithmetic expressions. Say, you had to find the volume of sphere. Just code COMPUTE VOLUME = (4/3) * (3.14159) * (RADIUS ** 3).

COBOL is a structured programming language. It employs a top-down design model. The whole program logic is divided into smaller sections and paragraphs. For example, 3000-PROCESS-FILE logic can be divided into 3100-OPEN-FILE, 3200-READ-FILE and 3300-CLOSE-FILE paragraphs. This makes it highly modular. You usually map-out similar functions or logic to its own separate paragraph. The logic for reading from a file can be encapsulated in a 3200-READ-FILE para. At any point in the program, if a file is to be read, you can just invoke or PERFORM 3200-READ-FILE.


Looping and looping and...

COBOL has a standard looping construct PERFORM UNTIL.. do-something END-PERFORM. We'll talk about loops at length, later. The syntax is so simple - you're probably asleep already. Keep doing everything inside the PERFORM .. END-PERFORM block over and over. Whatever it is that you want repeat, has to be inside the block.

The key to a loop is a conditional-test. In COBOL, a conditional-test is an expression that returns a boolean value, in other words a TRUE or FALSE. If you say something like, "Keep juggling, until no more juggling pieces are there", this is a clear boolean test. As long as the conditional-test is FALSE, keep juggling. When no more juggling balls are left, you stop.
           PERFORM UNTIL NO-MORE-JUGGLING-PIECES
  PERFORM KEEP-JUGGLING
  END-PERFORM

To many people who are thrown to work at a mainframe computer on their first job, they feel lost. Mainframe people seem to speak a completely different language and that doesn't make life easy. What's more, the books and manuals are incredibly hard to comprehend.

"What on earth is a Mainframe?" is an absolute beginner's guide to mainframe computers. We'll introduce you to the hardware and peripherals. We'll talk about the operating system, the software installed on a mainframe. We'll also talk about the different people who work on a mainframe. In a nutshell, we'll de-mystify the mainframe.

Readers based in India, can buy the e-book for Rs. 50 only or the print book. International readers based in the US and other countries can click here to purchase the e-book.