What’s new at MAINFRAMES 360

To many people who are thrown to work at a mainframe computer on their first job, they feel lost. The hardware is a complete mystery. Programmers work on green screens. Mainframe people seem to speak a completely different language and that doesn't make life easy. What's more, the books and manuals are incredibly hard to comprehend.

"What on earth is a Mainframe?" is an absolute beginner's guide to mainframe computers. We'll introduce you to the hardware and peripherals. We'll talk about the operating system, the software installed on a mainframe. We'll also talk about the different people who work on a mainframe. In a nutshell, we'll de-mystify the mainframe.

Readers based in India, can buy the e-book for Rs. 50 only or the print book.

International readers based in the US and other countries can click here to purchase the e-book.

Monday, January 21, 2013

Mainframe Information Representation and Storage

How Mainframe computers store data
On computers, data is stored in the form of bits – 0s and 1s (binary). Characters like A,B,C,...,Z are formed with 8 bits called as Bytes. When you press a key on the keyboard, the key emits out eight bits from the cable.

Every key is represented with a unique combination of 0s and 1s. Because, we use 8 bits to store a character, a total of 2^8 = 256 patterns are possible. IBM Mainframe’s designers assigned a unique 8-bit pattern to each character. This scheme of representing characters and data in mainframe computers is called Extended Binary Coded Decimal Interchange Code (EBCDIC). Every character occupies storage of one byte space in computer memory.

Fields, records, files and datasets
Financial institutions like banks are computerized. A bank may store data of its customers on a central computer. This would include details such as the full-name of the customer, his residential address, his social security number, his contact details and his account number. Data such as the customer name consist of a sequence or a group of characters. A group of characters that represent a data item is called a field. For example, the customer name is one field, customer address is another such field and so forth.

Associated fields such as the customer name, address, the account number etc. about an entity put together make up a record. A record represents data for a single instance. A customer record tells you everything about one customer. A record may be divided into several fields.

A record has a length. Assume that the customer name field can be 10 characters long, the customer address can have upto 30 characters, the contact details field can span 10 characters, the social security number 09 digits and the account number upto 21 digits. Recollect that each character on the mainframe computer occupies one byte of memory space. The length of each customer record would then be the sum of the fields’ sizes, 30 + 10 + 10 + 09 + 21 = 80 bytes. Hence, the record of a customer would occupy 80 bytes in computer storage.

If a collection of records are stored together, say 1000 customer records of the bank, it is called a file. A file is then, just a sequence of records. IBM mainframes use the term dataset instead of file.

Generally, datasets (files) are stored on computer storage devices. On the mainframe, there are two storage devices commonly used – disk and tape.

The concept of fields, records and files (datasets)

Mass Storage devices
Mass storage devices are used to permanently store data. They are non-volatile. It does not lose its contents even when the electrical power is cut-off. Mass storage devices can store humungous volumes of data. They have high capacity.

The data stored on a storage device can be accessed by two methods – Sequential Access and Direct (Random) access.

Sequential Access
Sequential access means that the mainframe system must search the storage device from the beginning, till the desired data is found. This is like playing a Bollywood music cassette tape on your Sony Walkman. The audio cassette tape of the Hindi movie film Jab Tak Hai Jaan had five songs – Challa ki, Saans me teri sung by Shreya Ghosal, Heer heer Harshdeep kaur, Jiya Re by Neeti Mohan and Jab tak hai jaan title track by Neeti Mohan. Now, if you had to listen to JTHJs title track, you will start at the first song and must go past the second, third songs and so forth, till you reach the last song. The most common storage device that allows sequential access is a Magnetic Tape.

Direct (Random) Access
Direct access implies that the mainframe system can directly locate the desired data on the storage device. This is like reading a topic in a reference book. If I would like to read about optical isomers in Organic Chemistry’s book written by Morrison Boyd, I’d check the entry optical isomers in the index. In the index, the entry optical isomer had the page number 257 against it. I can directly jump to page 257 of the book and read about optical isomers at length. I don’t have to read the book from cover-to-cover. The most common device that allows direct access is a Magnetic Disk.

Magnetic recording and playback
Faraday’s laws of electro-magnetic theory in Physics say that when a magnet is moved past a magnetic field, it can be magnetised, usually created by an electro-magnet. Like-wise, when a coil (wire) is moved past a magnetic field, it induces an electric voltage (signal) in the coil.

Usually, the surface of a storage media is coated with a magnetic material like Ferric oxide. At an atomic level, the oxide coating has several tiny poles (magnets) which are randomly oriented. The head is an electro-magnetic coil that receives computer data (Bit 0 or 1) in the form of electric pulses. The electro-magnetic coil creates a magnetic field around it.

While recording data, the surface of media is moved past the head (coil). Variations in the electric pulses cause variations in the magnetic field created by the head. The tiny poles (magnets) on the surface of the storage media are magnetised (aligned) and they orient themselves N-S or S-N depending on the bits 0 and 1. Playback is the reverse. N-S or S-N orientation of the poles of the storage media are sensed as bits 0 and 1 by the head.

Magnetic Tape
A standard magnetic tape consists of a ½ inch wide plastic ribbon coated with Ferric oxide magnetic substance. On the mainframe, each character (A-Z,a-z,0-9) is represented by a unique 8-bit pattern(1 byte). The magnetic tape has eight tracks to store eight bits for a character + one track for the parity bit which is used for error detection. Look at the figure below. It illustrates how the string HELLO will be stored on a magnetic tape.


How information is stored on a magnetic tape

Data can be stored and retrieved sequentially from a magnetic tape. It is not possible to get quick access to data. However, magnetic tapes are very cheap compared to other storage media. Large amounts of data can be stored on tape. It is the preferred medium for taking backups and archiving old data.

Blocks and Inter-block gap (IBG)
When you visit the mall to buy groceries for the coming month, you usually buy grain, cooking oil, flour and other non-perishable goods in bulk. In a single trip, you’d purchase items in sufficient quantity, so that stocks last at-least a month and you don’t have to make a second trip. It is economical.

A block is a contiguous group of records on a storage device. Media such as disks or tapes are said to be block-oriented devices as opposed to record-oriented. On these devices, a group of records are stored together as a block. The below figure illustrates, how records are blocked. Consecutive blocks have a gap between them called the Inter-block gap (IBG).


On the mainframe system, a block is the basic unit of data transfer. During a single READ or WRITE operation, 1 block is transferred from the storage device.

Blocks are described by their block size. It is upto the programmer to determine the block’s size (length). Say, I choose a block size = 800 bytes while storing customer records. Each customer record is 80 bytes long. Then, the number of records in each block is,

Blocking Factor = (Total Block-size / Record Length) = (800/80) = 10 records/block

The blocking factor is 10 records per block. As a result, you end up transferring 10 customer records in a single READ or WRITE. This indeed helps. Say, you were generating monthly account statements of all the customers. Having processed one customer record, the probability or likelihood of processing the neighbouring records (customer 2, customer 3, … and so forth) is very high. It would relatively cheap to get a block of 10 records in a single READ, rather than executing ten READs and get only one record at a time.

There’s a trade-off between performance and data-transfer time. One must realize that a small block size degrades the performance. Choosing a very large block size shall boosts the performance, but takes a toll on the amount of time required to transfer the data. Choosing the optimum block-size for your datasets is therefore important. Many mainframe shops install a software product that determines the best block-size for your file.

Advantages of blocking
1. Fewer I/O operations are needed because a single READ moves an entire block containing several records.

Disadvantages of blocking
1. Tiny software programs called access method routines block and de-block the data. This is an overhead.

Magnetic disk
A magnetic disk resembles a phonograph vinyl record. Here, the tracks are laid out in a circular shape. A single disk known as a platter has several concentric tracks. Data is stored on both the sides of the platter.

In IBM 3390 DASD drives used currently in the modern mainframe systems of today, eight such disks are stacked to form a disk pack or volume. When the drive is in operation, these platters revolve at a very high speed around an axis called the spindle (not shown the diagram).


Magnetic Disk unit organization

An arm (actuator) has READ/WRITE heads. There is one READ/WRITE head for each surface of a platter. But, how is a particular track located? To seek the desired track, the arm (actuator) moves the READ/WRITE heads from the outermost track towards the center of the disk. Note that, all the READ/WRITE heads move as one unit. So, if the arm moves the head to Track 150, all the READ/WRITE heads are positioned at Track 150 on their respective surfaces. The same track on each of the surfaces can be imagined to form a virtual cylinder.

Eight disks have 16 surfaces. One of the surfaces is used to record control information. As there are 15 surfaces on which data is recorded, 1 cylinder is equal to 15 tracks of storage space.

The READ/WRITE head assembly travel as a single unit and are capable of transferring an entire cylinder without any movement. As a consequence, storage space on the DASD drive is not filled up surface by surface. Instead, it is filled up cylinder by cylinder.

Forming dataset names
On a mainframe computer, there would be thousands of files (datasets). There must a unique name for every dataset. Mainframes support very large dataset names. On mainframes, a dataset name can be 44 characters long in this format:

XXXXXXXX.XXXXXXXX.XXXXXXXX.XXXXXXXX.XXXXXXXX

A dataset name is made up of several segments or qualifiers. Each segment can be upto eight characters in length. The qualifier must start with a capital letter (alphabet).

Generally, it is a good practice to give meaningful names to your dataset. For example, if you are storing Employees data in a file, you can name it as EMPLOYEE.DATA.

The operating system keeps track of groups of datasets by referring to their names, which are called qualifiers. The first part of the dataset name is called High-level Qualifier (HLQ).

Generally, when you access or log-in to Mainframes, you are given a TSO USER-ID, just as on Windows PC, you need a user-id to login. Most professionals or software engineers who work on a mainframe have a TSO user-id and password to access the mainframe computer.

When you use a TSO-id, a special requirement applies to most datasets (files) that belong to you. Suppose your TSO-id is AGY0157. Then all your datasets should have the High-level qualifier AGY0157. Thus, the name of the files that belong to you should start with AGY0157. For example, the name of Employees file would be AGY0157.EMPLOYEE.DATA. In fact, you can identify the files that belong to you, your application or your system by looking at the High-level Qualifier of the file.

Security software products like RACF, CA TOP-SECRET are generally installed at a mainframe shop. These products can be used to control access to files. For example, you may grant read-only access to the file AGY0157.EMPLOYEE.DATA to others. Thus, RACF would then prevent others from making changes to your dataset.

Sequential Datasets
Sequential datasets can be likened to a music cassette, on which songs are stored one after the other. When you play the cassette, you listen to music; one song and then the next song and so on … till you reach the end of the cassette. You cannot directly jump-to, or fly to the fifth song, or last song. Thus, a music cassette tape is accessed sequentially.

On the same lines, records in sequential files are stored one after the other in a series. In a sequential file, you need to read through all the records one by one, step-by-step till you reach the desired record.

Thus, a sequential dataset is the simplest form of dataset. On the mainframe zOS Operating System, sequential dataset is known as PS dataset, PS stands for Physical Sequential.


A PS(Physical Sequential) Dataset

Location of Datasets
On a mainframe system, every disk volume or pack is identified by a unique code of six-characters. This is called the volume serial. For example, the mainframe system I am connected has several 3390 DASD volumes S4RES1, S4RES2, OS39M1, S4DB21 and so forth.

About the author

Quasar Chunawalla is an author, blogger from Pune, India. He works as a technology guy. He has a passion for teaching and helps other aspiring developers build their skills. He is a movie addict, likes to travel and is a connoiseur of good food.

 
back to top