Bulletin Board

What’s new at MAINFRAMES 360
(Updates)...
- Re-writing of select JCL and VSAM Tutorials is under-way, orientation would be more hands-on, and practical
- New Tutorials on COBOL-VSAM programs shall be put up pretty soon
- IMS DB/DC Tutorials to start up in April, 2010

Mainframes360 Search

Loading

Quick Links

Jump to :  

Wednesday, July 29, 2009

Sorting by Multiple Fields, Merging Datasets, Extracting a subset of Records


Q. I’ve heard of this tool called DFSORT. What is DFSORT? Is it any good?

DFSORT is a utility program supplied by IBM. The main job of DFSORT is to sort(arrange) the records in a file in some order. Apart from sorting, you can do a lot more things with DFSORT. It turns out, that DFSORT is a very useful utility.

DFSORT can be used to filter out a subset of records from a dataset, and then only sort these records. Thus, DFSORT gives you to flexibility to include or omit records. You can even re-format the records - access field level data, add, edit, or delete columns. You can insert whitespace, zeros, date and time information etc.

DFSORT can also do group(aggregate) operations. For example, you might want to find the department-wise break of the sum of salary of Employees in an organisation.

EmpName        DeptID        Salary
---------------------------------------
Andy            10        1000 |
Bob             10        2000 |--Total of Dept 10 = 6000   
Cathy           10        3000 |
Dominic         20        4000   |--Total of Dept 20 = 9000
Edward          20        5000   |
Frank           30        6000--Total of Dept 30 = 6000

So, in other words, you can do analysis of a given set of raw input data, produce findings or conclusions and present them in the form of a report to the top-level managers. Hence, DFSORT is not just a sort utility, it is a complete analysis and reporting tool from IBM.

You can convert VB Datasets to FB and vice versa.

Q. Can you show Simple JCL/Batch to Sort the records in Input Dataset using DFSORT Utility?

Consider the following Input Dataset CAA0176.DEMO.SORTIN.

----+----1----+----2----+----3----+---
QUASAR  23423423434534344 1000 10
RAHUL   13342345345345345 2000 20
VEERA   34535345325354324 3000 20
JAYAN   98347385385933987 4000 30
MADHAV  67575789769876785 5000 30
MEGHA   50830948530859340 6000 30 
SNEHAL  30495849572938495 7000 40
JAINES  98347385385933987 8000 40


Suppose we want to sort the records in this input dataset by the NAME Field. The NAME Field starts at position 1 and it has length = 6. The data we want to sort is alphabetic(CH), and we want the records to be sorted in increasing/ascending order.

So, SORT FIELDS Control Card should be SORT FIELDS=(1,6,CH,A).

The batch JOB/JCL for sorting the records in ascending order of NAME will be

----+----1----+----2----+----3----+----4----+----5----+----6----+----7--
//JOB01  JOB   A123,QUASAR,NOTIFY=&SYSUID                              
//STEP01 EXEC  PGM=SORT,REGION=1024K                                   
//SORTIN DD    DSN=CAA0176.DEMO.SORTIN,DISP=SHR                        
//SORTOUT DD   DSN=CAA0176.DEMO.SORTOUT,DISP=(OLD,KEEP,KEEP)           
//SYSIN  DD    *                                                       
  SORT FIELDS=(1,6,CH,A)                                               
//                                                                     


The unarranged input records are in the dataset CAA0176.DEMO.SORTIN
The output records are in the dataset CAA0176.DEMO.SORTOUT(which I've already created through ISPF)

This is how the Output Dataset CAA0176.DEMO.SORTOUT shall look -

----+----1----+----2----+----3----+
JAINES  98347385385933987 8000 40 
JAYAN   98347385385933987 4000 30 
MADHAV  67575789769876785 5000 30 
MEGHA   50830948530859340 6000 30 
QUASAR  23423423434534344 1000 10 
RAHUL   13342345345345345 2000 20 
SNEHAL  30495849572938495 7000 40 
VEERA   34535345325354324 3000 20 

Q. Hey that was a piece of cake! You’ve put up the title as SORTING by Multiple Fields. What’s that? How does it work?
A simple way to understand this is as follows – Let’s say you sorting a file as in the above example. What happens if two people have the same identical first-names? In other words, the field/key by which sorting is done could be same for two or more rows. In this case, how we arrange the rows.

Well, the simple solution is to specify yet another field/key – Secondary Key. When the primary key field is the same for two rows R1 and R2, then the secondary key field comes into play. The order of the rows R1 and R2 in the output will then be decided by the secondary key field/value.

Consider the following input dataset

AgentID  NAME                Country   State     City      Sales

----+----1----+----2----+----3----+----4----+----5----+----6----+
AG10001  RAMESH CHANDRA GUHA INDIA     MAHARASHT MUMBAI    1000 
AG10002  QUASAR SHABBIR CHUN INDIA     MAHARASHT PUNE      2000 
AG10003  VEERAMANI DARYAGANJ INDIA     MAHARASHT NAGPUR    3000 
AG10005  DELPHAN FERANDAZ    INDIA     TAMIL NAD CHENNAI   4000 
AG10005  PADMAVATHI SUNDARAM INDIA     TAMIL NAD SALEM     5000 
AG10006  NEELESHWARI IYER    INDIA     TAMIL NAD ERNAKULAM 6000 
AG10007  AJAY NANUBHAI PARSA INDIA     WEST BENG KOLKATA   7000 
AG10008  MEHUL POPATLAL SUTA INDIA     WEST BENG KHARAGPUR 8000 
AG10009  KRITHIKA RAMANUJAM  INDIA     WEST BENG SUNDARBAN 9000 
AG10010  ANKUR CHOUDHARY     INDIA     PUNJAB    AMRITSAR  10000
AG10011  HIREN NITIN SHETH   INDIA     PUNJAB    JALANDHAR 11000
AG10012  DARSHAN SHAH        INDIA     HARYANA   CHANDIGAR 12000


This is the Input Dataset. The above data depicts the sales of Insurance Agents in different cities. For a given state around 2-4 agents cover all major cities. The top-level manager has asked for a report that shows the sales of different agents arranged in the alphabetical order of the State.

We would write the SORT FIELDS Control Statement as,
SORT FIELDS=(40,10,CH,A).
However, there are several records(Agents) in the same state. Then, we can sort them by their CITY.
SORT FIELDS=(40,10,CH,A,50,10,CH,A)

The Batch JOB/JCL for sorting the records by State, City is

----+----1----+----2----+----3----+----4----+----5----+----6----+----7--
//JOB02  JOB   A123,QUASAR,NOTIFY=&SYSUID                              
//STEP01 EXEC  PGM=SORT,REGION=1024K                                   
//SORTIN DD    DSN=CAA0176.DEMO.SORTIN,DISP=SHR                        
//SORTOUT DD   DSN=CAA0176.DEMO.SORTOUT,DISP=(OLD,KEEP,KEEP)           
//SYSIN  DD    *                                                       
  SORT FIELDS=(40,10,CH,A,50,10,CH,A)                                  
//                                                                     


Upon submitting the above the Batch JOB/JCL, we get the following Sorted Output in the file CAA0176.DEMO.SORTOUT =>

----+----1----+----2----+----3----+----4----+----5----+----6----+
AG10012  DARSHAN SHAH        INDIA     HARYANA   CHANDIGAR 12000
AG10001  RAMESH CHANDRA GUHA INDIA     MAHARASHT MUMBAI    1000 
AG10003  VEERAMANI DARYAGANJ INDIA     MAHARASHT NAGPUR    3000 
AG10002  QUASAR SHABBIR CHUN INDIA     MAHARASHT PUNE      2000 
AG10010  ANKUR CHOUDHARY     INDIA     PUNJAB    AMRITSAR  10000
AG10011  HIREN NITIN SHETH   INDIA     PUNJAB    JALANDHAR 11000
AG10005  DELPHAN FERANDAZ    INDIA     TAMIL NAD CHENNAI   4000 
AG10006  NEELESHWARI IYER    INDIA     TAMIL NAD ERNAKULAM 6000 
AG10005  PADMAVATHI SUNDARAM INDIA     TAMIL NAD SALEM     5000 
AG10008  MEHUL POPATLAL SUTA INDIA     WEST BENG KHARAGPUR 8000 
AG10007  AJAY NANUBHAI PARSA INDIA     WEST BENG KOLKATA   7000 
AG10009  KRITHIKA RAMANUJAM  INDIA     WEST BENG SUNDARBAN 9000

Q. Well that was quite a comprehensive example.. What do you mean  by Merging two datasets? Is is the same as just concatenating them together?

Suppose we have 3 input files which are sorted. We would like to merge them, in such a way, that resultant output dataset is also sorted.

You can merge two input datasets, and produce a single output dataset, such that the output is sorted using DFSORT tool. For merging, instead of using SORT FIELDS Control card, we will now use MERGE FIELDS Control Card.
Remember that, for merging the records in datasets, the records in the Input Datasets should already be sorted(they should be in the right sequence). (If it is not so, then DFSORT throws ABEND Code 16)

INPUT DATASET 1=>CAA0176.DEMO.SORTIN01

----+----1----+----2----+----3
00005 QUASAR CHUNAWALLA      
00008 RAMESH CHANDRA GUHA    


INPUT DATASET 2=>CAA0176.DEMO.SORTIN02

----+----1----+----2----+----3
00004 AHILYABAI SALVI        
00009 RAMABAI PESHWA         


INPUT DATASET 3=>CAA0176.DEMO.SORTIN03

----+----1----+----2----+----3
00003 AMEYA BOBADE           
00007 A.R. RAHMAN            


We can merge the above records, and taking the key field as EMPID, which starts from column 1 and has length 5. So, the MERGE FIELDS control statement would be :
MERGE FIELDS=(1,5,CH,A)

The Batch JOB/JCL for merging the above files looks like this :

----+----1----+----2----+----3----+----4----+----5----+----6----+----7--
//JOB04  JOB   A123,QUASAR,NOTIFY=&SYSUID                              
//STEP01 EXEC  PGM=SORT,REGION=1024K                                   
//SORTIN01 DD    DSN=CAA0176.DEMO.SORTIN01,DISP=SHR                    
//SORTIN02 DD    DSN=CAA0176.DEMO.SORTIN02,DISP=SHR                    
//SORTIN03 DD    DSN=CAA0176.DEMO.SORTIN03,DISP=SHR                    
//SORTOUT DD   DSN=CAA0176.DEMO.SORTOUT,DISP=(OLD,KEEP,KEEP)           
//SYSPRINT DD  DSN=CAA0176.DEMO.JOBLOG,DISP=(OLD,KEEP,KEEP)            
//SYSIN  DD    *                                                       
  MERGE FIELDS=(1,5,CH,A)                                              
//                                                                     


Upon submitting the above Batch JOB/JCL, we get the following output sorted file :

----+----1----+----2----+----3
00003 AMEYA BOBADE           
00004 AHILYABAI SALVI        
00005 QUASAR CHUNAWALLA      
00007 A.R. RAHMAN            
00008 RAMESH CHANDRA GUHA    
00009 RAMABAI PESHWA         


Q. How do you extract a subset of records from an Input Dataset using DFSORT? Like take out all records having sales > 5000 into another separate file.

For Records Selection, we have to use the INCLUDE/OMIT Control Statements of the DFSORT Utility. Inside the INCLUDE/OMIT Control cards, we can filter out particular rows, by specifying various conditions.

Thus, you can first extract a subset of records using the INCLUDE COND or OMIT COND Control Statement and then sort only these particular records using SORT FIELDS, instead of the entire file.

Suppose we have the following input Dataset, from which we would like to filter out records.

----+----1----+----2----+----3----+----4----+----5----+----6----+----7--
AG10001  RAMESH CHANDRA GUHA INDIA     MAHARASHT MUMBAI    01000       
AG10002  QUASAR SHABBIR CHUN INDIA     MAHARASHT PUNE      02000       
AG10003  VEERAMANI DARYAGANJ INDIA     MAHARASHT NAGPUR    03000       
AG10004  DELPHAN FERANDAZ    INDIA     TAMIL NAD CHENNAI   04000       
AG10005  PADMAVATHI SUNDARAM INDIA     TAMIL NAD SALEM     05000       
AG10006  NEELESHWARI IYER    INDIA     TAMIL NAD ERNAKULAM 06000       
AG10007  AJAY NANUBHAI PARSA INDIA     WEST BENG KOLKATA   07000       
AG10008  MEHUL POPATLAL SUTA INDIA     WEST BENG KHARAGPUR 08000       
AG10009  KRITHIKA RAMANUJAM  INDIA     WEST BENG SUNDARBAN 09000       
AG10010  ANKUR CHOUDHARY     INDIA     PUNJAB    AMRITSAR  10000       
AG10011  HIREN NITIN SHETH   INDIA     PUNJAB    JALANDHAR 11000       
AG10012  DARSHAN SHAH        INDIA     HARYANA   CHANDIGAR 12000       


Suppose we want to filter only those Agents, whose Sales are greater than 5000. Thus, we can code the INCLUDE COND as follows :

INCLUDE COND=(start-col(s),length(l),format(f),operator(o),constant(c))
-or-
INCLUDE COND=(field1-col,length,format(f),operator(o),field2-col,length)

Now, we would like to filter all the agents based on their SALES Figure. The SALES Column begins from position s=60 and is of length l=5. The Sales field, we assume is of type textual/character. We shall talk about numeric comparisions later. Since, we want Agent-SALES > 5000, Operator o = Greater than(GT), and constant c=5000. Character(string) constants are given as c’<constant>’. So, the constant would be c’05000’. If you are wondering why we couldn’t write simply c’5000’ has to do with the type of comparision.

Okay, so the INCLUDE COND Control statement for the above filter would be - INCLUDE COND=(60,5,CH,GT,C'05000')

The Batch JOB/JCL for the above task of filtering out records would be like this -

----+----1----+----2----+----3----+----4----+----5----+----6----+----7--
//CAA0176H JOB A123,QUASAR,NOTIFY=&SYSUID                              
//STEP01 EXEC  PGM=SORT,REGION=1024K                                   
//SORTIN DD    DSN=CAA0176.DEMO.SORTIN,DISP=SHR                        
//SYSPRINT DD  SYSOUT=A           
//SORTOUT DD   DSN=CAA0176.DEMO.SORTOUT,DISP=(OLD,KEEP,KEEP)           
//SYSIN  DD    *                                                       
  SORT FIELDS=COPY -                                                   
  INCLUDE COND=(60,5,CH,GT,C'05000')                                   
//                                

Upon submitting the above Batch JOB/JCL, we get the following output -

----+----1----+----2----+----3----+----4----+----5----+----6----+
AG10006  NEELESHWARI IYER    INDIA     TAMIL NAD ERNAKULAM 06000
AG10007  AJAY NANUBHAI PARSA INDIA     WEST BENG KOLKATA   07000
AG10008  MEHUL POPATLAL SUTA INDIA     WEST BENG KHARAGPUR 08000
AG10009  KRITHIKA RAMANUJAM  INDIA     WEST BENG SUNDARBAN 09000
AG10010  ANKUR CHOUDHARY     INDIA     PUNJAB    AMRITSAR  10000
AG10011  HIREN NITIN SHETH   INDIA     PUNJAB    JALANDHAR 11000
AG10012  DARSHAN SHAH        INDIA     HARYANA   CHANDIGAR 12000


Note : Here CH specifies that the field SALES is a Character(textual) item. BI will be used if you want to compare it in Binary format. Similarly, ZD and PD are used Zoned Decimal and Packed Decimal.

0 comments:

Post a Comment

Related Posts with Thumbnails

Quick Links

Jump to :  

Note :

© Copyright – Quasar Chunawalla, 2010.
Note : The copyrights of all the material, text and pictures posted in this website belong to the author. Any instance of lifting the material from this website, shall be considered as an act of plagiarism. For any clarifications, please mail at quasar.chunawalla@gmail.com
 
back to top