Checkpoint Restart in DB2 Part - II

The first part : Checkpoint Restart in DB2 Part - I


In first part we understood what is check point restart and why we use it. We also covered the problem associated with Check point restart and solutions to those problems.

Now, in this post we will see the step by step implementation of check point restart logic.

CHECKPOINT/RESTART Implementation:

STEP1: 

Create the CHECKPOINT-COMMIT record in the working storage section, to store 
the data, which is needed for the next unit of recovery. 

STEP2:

In the procedure division MAIN para: 
First check the restart status flag i.e. RESTART-IND of CHECKPOINT_RESTART 
table. 

If RESTART-IND = ‘N’ then
   if any output file exists  open output file in OUTPUT mode 
   start the normal process 
end 
If RESTART-IND = ‘Y’ then
 Move the SAVE-AREA information to CHECKPOINT-COMMIT record 
 if any output file exists 
  do the FILE REPOSITION: 
    Open the output file in INPUT mode. 
    Repeatedly  
      Read the output record and INSERT it into GLOBAL temp table 
      FILE_POSITION_GTT 
    Until the last unit of recovery write count. 
    Close the output file. 
   Open the output file in OUTPUT mode. 
open a cursor for a table FILE_POSITION_GTT
repeatedly fetch a cursor and write the record information into the output file  
until end of cursor
close a cursor 
end 
      If input for the program is from cursor then skip the rows until COMMIT-KEY. 
      If input for the program is from file then skip the records until COMMIT-KEY. 
End. 

Note: For more than one output files, delete GTT after repositioning each output file.

STEP3

Make a count for each  Insert’s/Update’s/Deletes in  RECORDS-PROCESSEDUOR variable. 

STEP4:

Go thro’ the logic and find out the appropriate place where COMMIT WORK can be hosted. There check the frequency of COMMITS: 

IF RECORDS-PROCESSED-UOR > COMMIT-FREQ

    KEY (input) value of the program                 TO COMMIT-KEY       
    MOVE checkpoint-commit record length  TO SAVE-AREA-LEN 
    MOVE checkpoint-commit record       TO SAVE-AREA-TEXT 
    Update the CHECKPOINT_RESTART table with this information 

END-COMMIT 

STEP5: 

Before STOP RUN statement; reset the RESTART flag of the 
CHECKPOINT_RESTART table. 
i.e. MOVE ‘N’ TO RESTART-IND
     Update the CHECKPOINT_RESTART table with the above information.


Sample COBOL code for CHECKPOINT/RESTART Logic: 

CHECKPOINT-COMMIT RECORD DEFINITION

 ***************************************************************
* GLOBAL TEMPORARY TABLE CURSOR DECLARATION & OPEN *
****************************************************************

EXEC SQL 

   DECLARE FPG-FPOS CURSOR FOR 
   SELECT RECORD_NUMBER ,RECORD_DETAIL 
   FROM FILE_POSITION_GTT 
   ORDER BY RECORD_NUMBER
END-EXEC. 

**************************************************************
***** CHECK-POINT RESTART DATA DEFINITIONS  ***** 
**************************************************************

01 COMMIT-REC. 
02 FILLER  PIC X(16) VALUE 'REC. PROCESSED: '.
02 COMMIT-KEY  PIC 9(06) VALUE 0. 
02 FILLER  PIC X(14) VALUE 'TOTAL COUNTS: '.
02 COMMIT-COUNTS.                                              
                   03 WS-REC-READ           PIC 9(06) VALUE 0. 
                   03 WS-REC-REJT             PIC 9(06) VALUE 0. 
                   03 WS-REC-WRIT            PIC 9(06) VALUE 0. 
                   03 WS-RECP-READ         PIC 9(06) VALUE 0. 
                   03 WS-RECP-UPDT         PIC 9(06) VALUE 0. 
01 CHKPRSL-VARS. 
02 RECORDS-PROCESSED-UOR PIC S9(09) COMP VALUE +0. 

************************************************************** 
*****  CHECK POINT RESTART LOGIC SECTION   ***** 
************************************************************** 

RESTART-CHECK. 
            MOVE 'XXXXXX  ' TO PROGRAM-NAME.                              
PERFORM RESTART-SELECT. 
IF RESTART-IND = 'Y'
 MOVE SAVE-AREA-TEXT TO COMMIT-REC 
   If input is from cursor the skip until the commit-key 
   If input is from file then skip the records until the commit-key                          
END-IF.

************************************************** 
***** CHECK RESTART STATUS  ***** 
************************************************** 

RESTART-SELECT. 

MOVE 0 TO RECORD-PROCESSED-UOR. 

EXEC SQL 

   SELECT RESTART_IND ,COMMIT_FREQ ,RUN_TYPE ,SAVE_AREA 
   INTO :RESTART-IND ,:COMMIT-FREQ ,:RUN-TYPE ,:SAVE-AREA 
   FROM CHECKPOINT_RESTART 
   WHERE PROGRAM_NAME = :PROGRAM-NAME              
        
END-EXEC. 

EVALUATE SQLCODE 

    WHEN 0              
                                          
         IF RESTART-IND = 'Y'
            DISPLAY '* * * * * * * * * * * * * * * * * * * * * * * * * **********'
            DISPLAY ' ***PROGRAM - ' PROGRAM-NAME ' RESTARTED***'
            DISPLAY '* * * * * * * * * * * * * * * * * * * * * * * * * **********' 
            DISPLAY ' '
         END-IF 

    WHEN 100               
                                       
         PERFORM RESTART-INSERT

     WHEN OTHER   
                                                 
        MOVE 'RESTART-SELECT  ' TO WS-PARA-NAME   
        MOVE 'CHECKPOINT_RESTART SELECT ERR' TO WS-PARA-MSG    
        PERFORM EXCEPTION-ROUTINE 
END-EVALUATE. 

************************************************************** 
***** INSERT THE NEW RESTART STATUS RECORD ***** 
************************************************************** 

RESTART-INSERT. 
MOVE SPACES  TO CALL-TYPE. 
MOVE SPACES  TO CHECKPOINT-ID. 
            MOVE 'N'  TO RESTART-IND. 
            MOVE 'B'  TO RUN-TYPE. 
            MOVE +500  TO COMMIT-FREQ.                        
            MOVE ZEROES              TO COMMIT-SECONDS.                     
            MOVE +4006               TO SAVE-AREA-LEN.                      
MOVE SPACES  TO SAVE-AREA-TEXT. 
EXEC SQL 
    INSERT INTO CHECKPOINT_RESTART 
    (
      PROGRAM_NAME ,CALL_TYPE ,CHECKPOINT_ID ,RESTART_IND
     ,RUN_TYPE,COMMIT_FREQ ,COMMIT_SECONDS ,COMMIT_TIME 
     ,SAVE_AREA 
     ) 
VALUES 
   (
    :PROGRAM-NAME ,:CALL-TYPE ,:CHECKPOINT-ID ,:RESTART-IND ,:RUN-TYPE 
   ,:COMMIT-FREQ  ,:COMMIT-SECONDS, CURRENT TIMESTAMP ,:SAVE-AREA 
   ) 

END-EXEC. 

EVALUATE SQLCODE 
   WHEN 0                                                        
          CONTINUE 
   WHEN OTHER                                                    
           MOVE 'RESTART-INSERT  ' TO WS-PARA-NAME       
           MOVE 'CHECKPOINT_RESTART INSERT' TO WS-PARA-MSG        
           PERFORM EXCEPTION-ROUTINE 
END-EVALUATE.

********************************************************** 
***** UPDATE THE CHECKPOINT RECORD ***** 
********************************************************** 

RESTART-COMMIT. 
            MOVE 'Y' TO RESTART-IND. 
EXEC SQL 

   UPDATE CHECKPOINT_RESTART 
   SET RESTART_IND = :RESTART-IND ,SAVE_AREA = :SAVE-AREA ,
   COMMIT_TIME = CURRENT TIMESTAMP 
   WHERE PROGRAM_NAME = :PROGRAM-NAME 
                     
END-EXEC. 

EVALUATE SQLCODE 
  WHEN 0                
                                        
   EXEC SQL COMMIT WORK END-EXEC                            
       EVALUATE SQLCODE 
         WHEN 0                                                   
           CONTINUE 
         WHEN OTHER                                               
           MOVE 'RESTART-COMMIT' TO  WS-PARA-NAME              
           MOVE 'COMMIT ERROR' TO WS-PARA-MSG        
           PERFORM EXCEPTION-ROUTINE 
       END-EVALUATE 
       MOVE 0 TO RECORD-PROCESSED-UOR 

  WHEN OTHER                                                    
    MOVE 'RESTART-COMMIT' TO WS-PARA-NAME   
    MOVE 'CHECKPOINT_RESTART UPDATE ERR' TO WS-PARA-MSG    
    PERFORM EXCEPTION-ROUTINE 
END-EVALUATE.

******************************************************************* 
***** RESET THE RESTART FLAG AT THE END OF PROGRAM ***** 
******************************************************************* 

RESTART-RESET. 
MOVE 0 TO RECORD-PROCESSED-UOR. 
MOVE 'N' TO RESTART-IND. 

EXEC SQL 
   UPDATE CHECKPOINT_RESTART 
   SET RESTART_IND = :RESTART-IND ,COMMIT_TIME = CURRENT TIMESTAMP 
   WHERE PROGRAM_NAME = :PROGRAM-NAME                      
END-EXEC. 

EVALUATE SQLCODE 
   WHEN 0                                                        
        EXEC SQL COMMIT WORK END-EXEC                            
    WHEN OTHER                                                    
        MOVE 'RESTART-RESET' TO WS-PARA-NAME   
        MOVE 'CHECKPOINT_RESTART DELETE ERR' TO WS-PARA-MSG    
        PERFORM EXCEPTION-ROUTINE 
END-EVALUATE. 
*************************************************************
***** OUTPUT FILE REPOSITION LOGIC SECTION  ***** 
********* * ***** ********************************************

************************************************************************ 
***** GLOBAL TEMPORARY TABLE CURSOR DECLARATION & OPEN  ***** 
************************************************************************* 
FPG-OPEN. 

EXEC SQL 
  OPEN FPG-FPOS 
END-EXEC

EVALUATE SQLCODE 
    WHEN 0                                                        
         CONTINUE 
     WHEN OTHER                                                    
          MOVE 'FPG-OPEN'                    TO WS-PARA-NAME       
          MOVE 'GLOBAL TEMP TABLE OPEN  ERR' TO WS-PARA-MSG        
          PERFORM EXCEPTION-ROUTINE 
 END-EVALUATE. 

************************************************************
****   GLOBAL TEMPORARY TABLE CURSOR FETCH  ***** 
************************************************************

FPG-FETCH. 
EXEC SQL 

  FETCH FPG-FPOS INTO :FPG-RECORD-NUMBER ,:FPG-RECORD-DETAIL 

END-EXEC. 

EVALUATE SQLCODE 
   WHEN 0                                                        
        CONTINUE 
    WHEN +100                                                     
        MOVE 0 TO FPG-RECORD-NUMBER                            
     WHEN OTHER                                                    
        MOVE 'FPG-FETCH '                  TO WS-PARA-NAME       
        MOVE 'GLOBAL TEMP TABLE FETCH ERR' TO WS-PARA-MSG        
        PERFORM EXCEPTION-ROUTINE 
END-EVALUATE. 

************************************************************
***** GLOBAL TEMPORARY TABLE CURSOR CLOSE  ***** 
************************************************************

FPG-CLOSE. 

EXEC SQL 

   CLOSE FPG-FPOS
END-EXEC. 

EVALUATE SQLCODE 
     WHEN 0                                                        
          MOVE 0 TO FPG-RECORD-NUMBER                              
      WHEN OTHER                                                    
          MOVE 'FPG-FPOS-CLOSE '             TO WS-PARA-NAME       
          MOVE 'GLOBAL TEMP TABLE CLOSE ERR' TO WS-PARA-MSG        
          PERFORM EXCEPTION-ROUTINE 
END-EVALUATE. 

*********************************************************** 
***** GLOBAL TEMPORARY TABLE INSERTS  ***** 
*********************************************************** 

FPG-INSERT. 
ADD 1 TO FPG-RECORD-NUMBER.                                  
EXEC SQL 

   INSERT INTO FILE_POSITION_GTT 
  ( 
    RECORD_NUMBER ,RECORD_DETAIL 
   ) 
VALUES 
 :FPG-RECORD-NUMBER 
,:FPG-RECORD-DETAIL 

END-EXEC. 

EVALUATE SQLCODE 
    WHEN 0                                                        
         CONTINUE 
     WHEN OTHER                                                    
         MOVE 'FPG-INSERT  '              TO WS-PARA-NAME       
         MOVE 'GLOBAL TEMP TABL INSERT ERR' TO WS-PARA-MSG        
         PERFORM EXCEPTION-ROUTINE 
END-EVALUATE. 
RESTART-FILE-REPOSITION. 
OPEN INPUT outputfile-name.                                         
MOVE LENGTH OF output-record TO FPG-RECORD-DETAIL-LEN.             
READ output-file INTO FPG-RECORD-DETAIL-TEXT.                   
PERFORM UNTIL FPG-RECORD-NUMBER >= output record count of last commit
PERFORM FPG-INSERT 
READ output-file INTO FPG-RECORD-DETAIL-TEXT            
END-PERFORM. 
CLOSE output-filename
OPEN OUTPUT outputfile-name.                                        
PERFORM FPG-OPEN. 
PERFORM FPG-FETCH. 
PERFORM UNTIL FPG-RECORD-NUMBER = 0 
WRITE outputfile-record FROM FPG-RECORD-DETAIL-TEXT 
PERFORM FPG-FETCH 
END-PERFORM. 
PERFORM FPG-CLOSE. 
---------skip input file until the last commit------------------ 
DISPLAY '  *** ALREADY ' COMMIT-KEY ' RECORDS PROCESSED ***'. 
DISPLAY ' '
DISPLAY ' '.
 / 
*********************************************************** 
************** E X C E P T I O N  R O U T I N E **************
*********************************************************** 

EXCEPTION-ROUTINE. 

MOVE SQLCODE TO WS-SQL-RET-CODE.                          
            DISPLAY '*************************************************'.  
            DISPLAY '****  E R R O R   M E S S A G E S  ****'.  
            DISPLAY '*************************************************'.  
            DISPLAY '* ERROR IN PARA.....: ' WS-PARA-NAME.              
            DISPLAY '*        MESSAGES.....: ' WS-PARA-MSG.               
            DISPLAY '*'.
            DISPLAY '* SQL RETURN  CODE..: ' WS-SQL-RET-CODE.           
            DISPLAY '*************************************************'.  

Output file Disposition in JCL: 

♦  In JCL, disposition must be given as DISP=(NEW,CATLG,CATLG) or DISP=(OLD,KEEP,KEEP) 
♦  Override statement is needed for the output files if job abended: 
        1.  GDG with DISP=(NEW,CATLG,CATLG) 
              Override stmt: 
                 •  Change +1 generation to 0 (current) generation 
                 •  DISP=(OLD,KEEP,KEEP) 
       2.  GDG with DISP=(OLD,KEEP,KEEP) 
             Override stmt: 
                •  Change +1 generation to 0 (current) generation 

Output file with Disposition MOD: 

•  If output file is already existing, and program is appending records to that, then the File re-positioning must be handled in different way according to the requirements. 

Internal Sort: 

If any Commit-Restart program has Internal Sort, remove it and have an External Sort. 

POINTS TO REMEMBER 
  • All the update programs must use COMMIT frequency from the CHECKPOINT_RESTART table only 
  • Avoid – Internal Sorts 
  • Avoid – Mass updates (Instead, use cursor with FOR 
  • UPDATE clause and update one record at a time) 
  • On-call analyst should back-up all the output files before 
  • restart (The procedure should be documented in APCDOC) 
  • Reports to dispatch should be sent to a flat file; send the file 
  • to dispatch up on successful completion of the job 
  • Save only the working storage variables that are required 
  • for RESTART in the CHECKPOINT_RESTART table 
  • RESET the RESTART_IND flag at the end of the program 
  • If COMMIT-RESTART logic is introduced in an existing 
  • program then make relevant changes to the PROCJCL.                    
                                                   

Checkpoint Restart in DB2 Part - I



>> Click here for Part II <<

Before getting into what is check point restart and how to implement it,you need to understand when and why you should go for it.

HERE’S THE SCENARIO:

Suppose, a batch program that basically reads an input file and posts the updates/inserts/deletes to DB2 tables in the database was abended before the end of the job because of some reasons; Is it possible to tell - How many input records were processed? Were any of the updates committed to the database or can the job be started from the beginning?

Assume that COMMIT logic was not coded for large batch jobs that process millions of records. If an ABEND occurs all database updates will be rolled back and the job can be resubmitted from the beginning. If an ABEND occurs near the end of the process, the rollback of all the updates is performed. Also, DB2 will maintain a large number of locks for a long period of time, reducing concurrency in the system. In fact, the program may ABEND if it tries to acquire more than the installation-defined maximum number of locks.

This involves setting up a batch-restart control table (CHECKPOINT_RESTART in our case) to store the last input record processed and other control information. The restart control table can also be used as an instrumentation table to control the execution, commit frequency, locking protocol and termination of batch
jobs.

One of the problems with restart is synchronizing DB2 tables and output files(If your program updates some output file). DB2 will rollback all work on DB2 tables to the last commit point; but for output files we have to delete all the records up to the last commit point. (One option to do this would be via a global temporary table, FILE_POSITION_GTT, See FILE REPOSITIONING section for further details.).

COMMIT Function

The COMMIT statement ends a unit of recovery and commits the relational database changes that were made in that unit of recovery. If relational databases are the only recoverable resources used by the application process, COMMIT also ends the unit of work. The unit of recovery in which the statement is executed is ended and a new unit of recovery is effectively started for the process. All changes made by ALTER, COMMENT ON, CREATE, DELETE, DROP, EXPLAIN, GRANT, INSERT, LABEL ON SQL connections are ended when any of the following apply:

► The connection is in the release pending state
► The connection is not in the release pending state but it is a remote connection and:

  • The DISCONNECT(AUTOMATIC) bind option is in effect, or
  • The DISCONNECT (CONDITIONAL) bind option is in effect and an open WITH HOLD cursor is not associated with the connection.
For existing connections,

► All open cursors that were declared with the WITH HOLD option are preserved, along with any 
SELECT statements that were prepared for those cursors.

► All other prepared statements are destroyed unless dynamic caching is enabled.

► If dynamic caching is enabled, then all prepared SELECT, INSERT, UPDATE 
and DELETE statements
 that are bound with KEEPDYNAMIC (YES) are kept past the commit.
 Prepared statements cannot be kept past a commit if:

      * SQL RELEASE has been issued for that site, or
      * Bind option DISCONNECT(AUTOMATIC) was used, or 
      * Bind option DISCONNECT (CONDITIONAL) was used and there are no hold cursors.

All implicitly acquired locks are released, except for those required for the cursors that were
  • A place to store the details (CHECKPOINT-COMMIT record) pertaining to the current execution of the program, like various counts (number of inserts/deletes/updates/selects), number of processed, processing dates, and other details which are needed in the program after a RESTART. 
  • A reliable FILE RE-POSITIONING logic with minimal changes to the existing PROCJCL.
  • Flexibility, to modify the commit frequency without changing the program code.
Where we can store this CHECKPOINT-COMMIT record?

We can store the CHECKPOINT-COMMIT record, COMMIT-FREQUENCY and other relevant
information in a DB2 table .

CHECKPOINT_RESTART TABLE DESCRIPTION

FILE RE-POSITIONING
At restart, all records written to the output file since the last commit will have to be removed. To accomplish this, FILE_POSITION_GTT global temporary table is used. SQL statements that use global temporary tables can run faster because:


• DB2 does not log changes to global temporary tables
• Global temporary tables do not experience lock contention
• DB2 creates an instance of the temp table OPEN/SELECT/INSERT/DELETE stmts. only occur:

• An instance of a temporary table exists at the current server until one of the following actions occur:

• The remove server connection under which the instance was created terminates
     For ROLLBACK stmt, DB2 deletes the instance of the temporary table.
     For COMMIT stmt, DB2 deletes the instance of the temporary table unless cursor
     for accessing temporary table is defined WITH HOLD and is open.
 •  The application process ends.



In next part we will discuss step by step process of check point restart installation along with sample code.


    

Is it a PLAN? Is it a PACKAGE? NO, It's a CONFUSION !!!

No matter how long programmers have worked with DB2 for z/OS , they still have the confusion on difference between a plan and a package and what the heck is that collection anyway.

In this post i will be specific on the PLAN/PACKAGE difference and the things related to it (Such as COLLECTION and PKLIST). If you want to know from basic and detailing of what is PLAN/PACKAGE and how it get formed and why we need it then ,THIS is the great place to start up with,rather i would recommend you to read it before going further with this post.

OK so here it goes.

The PLAN/PACKAGE are nothing but the container which contains the logic of how your SQL statement in the program should get executed.[Remember when we write any query in SQL, we say what we want and NOT how it is to be done (We don't care how DB2 internally does it)].So PLAN/PACKAGE contains the LOGIC how it is to be done.Then this PLAN/PACKAGE is used with load module (For COBOL) to run your application program smoothly.

Now why two things?? PLAN and  PACKAGE , if both are the same (Contains the LOGIC for executing your SQL) and what exactly is the difference between the two.


Binding DBRM to a PLAN :

  • The PLAN contains the (Logic of) one or DBRM OR one or more PACKAGES OR the combination of both.

  • The PLAN is executable.When i say executable it means ,it can be executed with COBOL load module to run your application program.

  • Now let's consider there are 10 programs with names A TO J. So there will be 10 DBRM ( 1 DBRM for each program). I bind all 10 DBRM's to a PLAN (with name 'PLANA').Now let's say i got a requirement to change one of the Program ( let's say Program 'B') then i NOT ONLY need to bind this DBRM again to plan PLANA but all remaining 9 DBRM too (even though they didn't went under any change) and that is the biggest disadvantage of a binding a DBRM to a PLAN.NOT only this, if in my system there are various other PLAN's which uses this DBRM (The one which corresponds to Program 'B') then all those PLAN'S need to rebind along with all the other existing DBRM'S they may be having.

  • There are various parameters called as Bind parameter are available, when you bind your DBRM to a PLAN. These Bind parameters helps in deciding various important aspects for you program such as when to acquire a lock for the particular resource in your program,when to release the lock,what qualifier to use for the objects refered in your application program,the owner of plan etc. So when you bind your all DBRM's (10 DBRM in this example) to PLAN you have only one set of combination of these bind parameters for all 10 DBRM (For example you can not set different isolation level for different Program/DBRM , it has to be same)

Binding DBRM to a PACKAGE:

  • The PACKAGE contains the (Logic of) ONLY one DBRM.It's a single bound DBRM.
  • The PACKAGE is not executable that means it can not be use with COBOL load module to run your Application program.In order to make your PACKAGE to be executed it has to bonded to PLAN again.So you bind your DBRM to PACKAGE and then in turn this PACKAGE to PLAN.I know what you must be thinking now,why to go this way,  isn't it better to bind DBRM to straight away to PLAN.NO IT IS NOT, i will explain you why it is so.When i say you have to bind the DBRM to PACKAGE and then this PACKAGE to PLAN again,you need not to do it for ever package. What i mean is, let's say i have 50 PACKAGES to bind to a PLAN no need to BIND it 50 times instead just bind a single package to PLAN using PKLIST(Package List) that's it ,you are done and remaining 49 PACKAGES will be automatically added to your plan (For this all your 49 Packages need to be in same PKLIST what we have bonded with PLAN earlier)

  • Let's take the same example what we use above for PLAN. I have 10 programs A TO J with 10 DBRM. Now in case of PACKAGE , I will bind each DBRM to one PACKAGE so i will have 10 Packages then include all 10 PACKAGES to single PKLIST (Package List) bind the Package list to a PLAN.Now if one of my Program changes , then i have to rebind only the DBRM corresponding to the Program changing and NOT the remaining DBRM'S/PROGRAM (Which didn't went under the change)

  • Regarding Bind parameter, since i can bind 10 DBRM to 10 different PACKAGES, I can have various combination of Bind parameters for all PACKAGES.(For example Isolation level of CS for 1 PACKAGE and RR to other etc.)

Having said that, you will wonder,you have only the advantages of binding DBRM to PACKAGE and don't have any reason to go for binding DBRM to a PLAN.

That is the reason , IBM is going to remove this facility of binding DBRM to PLAN in DB2 V9 Onwards
For more detail on the same refer HERE.

JQEUME5S4HKU

Lock escalation in DB2.

Lock escalation as it's name suggest is nothing but escalating the lock on the object from lower level to the higher.Lock escalation is controlled by the parameter LOCKMAX of CREATE TABLESPACE and ALTER TABLESPACE.

The purpose of lock escalation is to reduce the impact of low level locks, both in terms of CPU and storage, at the possible expense of concurrency. When the number of low level locks exceeds a specified threshold, DB2 attempts to promote the high level intent locks, either IS or IX, to S or X, respectively. If this succeeds, then the low level locks are all released and no more low level locks are requested. The lock promotion request may itself fail if other processes hold incompatible locks, either low or high level, for longer than the timeout period. If the higher level lock cannot be acquired before the timeout value has been reached, then the process times out, all updates are rolled back, and all locks are released.

LOCKMAX :

The parameter LOCKMAX of CREATE TABLESPACE and ALTER TABLESPACE controls the maximum number of page or row locks that an application process can hold simultaneously in a single table space. In segmented table spaces, a single table can be locked. Segmented table space organization allows some tables to perform page or row locking and other tables in the same table space to perform table locking. If the number of locked pages or rows for a table exceeds the limit, a table lock is acquired for that table.

Lock escalation is controlled by the LOCKMAX option. If a process requests more than the number specified in LOCKMAX, the intent lock on the table space, partition, or table is promoted to S or X and the page or row child locks are released.


Lock escalation occurs when the number of locks held by a single process exceeds the LOCKMAX value on a table space. Lock escalation involves obtaining a table or table space lock, then releasing all of the page or row locks. Lock escalation is a safety valve that DB2 provides in case an application overuses system resources. LOCKMAX 0 means that you are disabling this useful feature. Loss of concurrency is less likely to be incurred with lock escalation if the majority of the rows or pages in a table space are already locked, as long as RELEASE(COMMIT) is used. Lock escalation in such a situation reduces storage use, but more importantly, reduces the CPU time needed to traverse a lock hash synonym chain. Lock escalation also reduces the Internal Resource Lock Manager (IRLM) latch suspension time, which also reduces CPU time.

Lock escalation for a partitioned table space will cause DB2 to promote the intent locks for every partition where child locks are currently held. Any subsequent access to a new partition will cause DB2 to request gross locks on that partition from the outset. The LOCKMAX figure applies to all locks held at the table space level when it is a multi-table table space.

The column LOCKMAX on the catalog table SYSIBM.SYSTABLESPACE stores the maximum number of locks per user on the table or table space before escalating to the next locking level. A value of -1 indicates LOCKMAX=SYSTEM. Any escalation process is suspended during the execution of SQL statements for ALTER, CREATE, DROP, GRANT, and REVOKE.

Lock escalation is an expensive process, both in terms of lock management and concurrency. In today’s high availability environment, you should design the application to allow no reason for lock escalations to occur. The benefit of lock escalation is in protecting DB2 from an excessive number of locks held in the IRLM on behalf of poorly designed applications that are not releasing their locks.

Lock escalation also prevents poorly designed applications from reaching the maximum number of locks per user specified by the NUMLKUS DSNZPARM. From an individual application perspective, this may seem like a good thing, but from an overall availability and performance standpoint, it is merely rewarding bad design. The focus should be on identifying and correcting those poorly designed applications that take an excessive number of locks.

For critical large table spaces that are accessed by many application processes concurrently, choose the LOCKMAX value so that there will not be any lock escalation within a unit of work, if NUMLKTS DSNZPARM is not large enough.

Row Identifier in DB2 (V9 Onwards)

I use to wonder why we don't have something called as row identifier in DB2 as it is there in oracle or MySQL.Finally IBM added it in V9 and continued in V10.

Let me explain what row identifier is and what purpose it serves.

The RID function returns the record ID (RID) of a row. The RID is used to uniquely identify a row.

The function might return a different value when it is invoked multiple times for a row. For example, after the REORG utility is run, the RID function might return a different value for a row than would have been returned prior to the REORG utility being run. The RID function is not deterministic.

DB2 might reuse RID numbers when a REORG operation is performed. If the RID function is used to obtain a value for a row and an application depends on that value remaining the same as long as the row exists, consider the following alternatives:
  • Add a ROW ID column to the table to provide a value that can be associated with each row, rather than invoking the RID function to generate a value for a row.
  • Define a primary key for the table, using the columns of the primary key to ensure uniqueness, rather than invoking the RID function to generate a value for a row.
Example : Return the RID and last name of employees who are in department '20':
 
 SELECT RID(EMP), LASTNAME
     FROM EMP
     WHERE DEPTNO = '20';
Let me give you the practical example of use of row identifier in my shop.
We had one table in our database which didn't had any primary key to it and hence no facility to identify any unique row in the table (Agree, the table was badly designed by my DBA)
There was one program which use to insert rows in this table, but sometimes it inserts the duplicate and we didn't had any alarm for it as there is no primary key and hence no unique index.So while fetching row from this table in the program,it used to give -811 sqlcode.So to resolve this abend we have to delete two rows(duplicate) and insert one (we can't delete only one duplicate row as we don't have anything unique to identify the row ) So here it was the RID which helped us to delete the duplicate row straight away (no need to insert two rows and then insert one).

Isolation Level in DB2.

As i mentioned earlier, i wanted to take this concept in a separate post.Isolation level parameter is the one you specify while binding your Application program's DBRM in to Plan/Package.

The problem with Isolation level is no matter how many times you read it or from which source you read it you won't understand it.I mean you just read it but when you have to actually use it in you shop, you wonder how to use it and this is because you don't have any practical example of it .once you have any real world example of each isolation level, you most probably will remember and most importantly use wisely in your instances.

So let's just understand each isolation level in detail with the real world example.

First of all understand what is isolation level and why we need it??

Several different users can access and modify data stored in a DB2 database at the same time, the DB2 Database Manager must be able to allow users to make necessary changes while ensuring that data integrity is never compromised.The sharing of resources by multiple interactive users or application programs at the same time is known as concurrency.
One of the ways DB2 enforces concurrency is through the use of isolation levels, which determine how data accessed and/or modified by one transaction is "isolated from" other transactions.

DB2 recognizes and supports the following isolation levels:
  • Repeatable Read     (RR)
  • Read Stability          (RS)
  • Cursor Stability       (CS)
  • Uncommitted Read (UR)


The Repeatable Read Isolation Level 

The Repeatable Read isolation level is the most restrictive isolation level available.

When it's used, the effects of one transaction are completely isolated from the effects of other concurrent transactions.

When this isolation level is used, every row that's referenced in any manner by the owning transaction is locked for the duration of that transaction. As a result, if the same SELECT SQL statement is issued multiple times within the same transaction, the result data sets produced are guaranteed to be identical. In fact, transactions running under this isolation level can retrieve the same set of rows any number of times and perform any number of operations on them until terminated, either by a commit or a rollback operation. However, other transactions are prohibited from performing insert, update, or delete operations that would affect any row that has been accessed by the owning transaction as long as that transaction remains active.
To ensure that the data being accessed by a transaction running under the Repeatable Read isolation level is not adversely affected by other transactions, each row referenced by the isolating transaction is locked—not just the rows that are actually retrieved or modified.

Thus, if a transaction scans 1,000 rows in order to retrieve 10, locks are acquired and held on all 1,000 rows scanned—not just on the 10 rows retrieved.

            

Real World Example : So how does this isolation level work in a real-world situation? lets's see..

Suppose you use a DB2 database to keep track of hotel records that consist of reservation and room rate information, and you have a Web-based application that allows individuals to book rooms on a first-come, first-served basis.
If your reservation application runs under the Repeatable Read isolation level, a customer scanning the database for a list of rooms available for a given date range will prevent you (the manager) from changing the room rate for any of the room records that were scanned to resolve the customer's query. Similarly, other customers won't be able to make or cancel reservations that would cause the first customer's list of available rooms to change if the same query were to be run again (provided the first customer's transaction remained active). However, you would be allowed to change room rates for any room record that was not read when the first customer's list was produced; likewise, other customers would be able to make or cancel room reservations for any room whose record was not read in order to produce a response to the first customer's query.

The Read Stability Isolation Level

The Read Stability isolation level is not quite as restrictive as the Repeatable Read isolation level; therefore, it does not completely isolate one transaction from the effects of other, concurrent transactions.

When the Read Stability isolation level is used, only rows that are actually retrieved or modified by the owning transaction are locked.

Thus, if a transaction scans 1,000 rows in order to retrieve 10, locks are only acquired and held on the 10 rows retrieved, not on the 1,000 rows scanned.

Because fewer locks are acquired, more transactions can run concurrently. As a result, if the same SELECT SQL statement is issued two or more times within the same transaction, the result data set produced may not be the same each time.

As with the Repeatable Read isolation level, transactions running under the Read Stability isolation level can retrieve a set of rows and perform any number of operations on them until terminated. Other transactions are prohibited from performing update or delete operations that would affect the set of rows retrieved by the owning transaction as long as that transaction exists; however, other transactions can perform insert operations.

Real World Example : We will take the same example and see how read stability change it.

Now, when a customer scans the database to obtain a list of rooms available for a given date range, you (the manager) will be able to change the rate for any room that does not appear on the customer's list. Likewise, other customers will be able to make or cancel reservations that would cause the first customer's list of available rooms to change if the same query were to be run again. As a result, if the first customer queries the database for available rooms for the same date range again, the list produced may contain new room rates and/or rooms that were not available the first time the list was generated.


The Cursor Stability Isolation Level

The Cursor Stability isolation level is even more relaxed than the Read Stability isolation level in the way it isolates one transaction from the effects of other concurrent transactions.

The cursor stability isolation level only locks the row that is currently referenced by a cursor that was declared and opened by the owning transaction. (The moment a record is retrieved from a result data set, a pointer—known as a cursor—will be positioned on the corresponding row in the underlying table, and that row will be locked. The lock acquired will remain in effect until the cursor is repositioned—more often than not by executing the FETCH SQL statement—or until the owning transaction terminates.) And because only one row-level lock is acquired, more transactions can run concurrently.

The Cursor Stability isolation level is the isolation level used by default.

When a transaction using the Cursor Stability isolation level retrieves a row from a table via a cursor, no other transaction is allowed to update or delete that row while the cursor is positioned on it. Other transactions, however, can add new rows to the table as well as perform update and/or delete operations on rows positioned on either side of the locked row—provided the locked row itself wasn't accessed using an index. Once acquired, the lock remains in effect until the cursor is repositioned or until the owning transaction is terminated. (If the cursor is repositioned, the lock being held is released and a new lock is acquired for the row to which the cursor is moved.) Furthermore, if the owning transaction modifies any row it retrieves, no other transaction is allowed to update or delete that row until the owning transaction is terminated, even though the cursor may no longer be positioned on the modified row.

As you might imagine, when the Cursor Stability isolation level is used, if the same SELECT SQL statement is issued two or more times within the same transaction, the results returned may not always be the same. In addition, transactions using the Cursor Stability isolation level will not see changes made to other rows by other transactions until those changes have been committed.

Real world example: Let's reserve the hotel again by using Cursor Stability

When a customer scans the database for a list of rooms available for a given date range and then views information about each room on the list produced (one room at a time), you (the manager) will be able to change the room rates for any room in the hotel except the room the customer is currently looking at (for the date range specified). Likewise, other customers will be able to make or cancel reservations for any room in the hotel except the room the customer is currently looking at (for the date range specified). However, neither you nor other customers will be able to do anything with the room record the first customer is currently looking at. When the first customer views information about another room in the list, you and other customers will be able to modify the room record the first customer was just looking at (provided the customer did not reserve it for himself). Again, neither you nor other customers will be able to do anything with the room record at which the first customer is currently looking.

The Uncommitted Read Isolation Level

The Uncommitted Read isolation level is the least restrictive isolation level available. In fact, when the Uncommitted Read isolation level is used, rows retrieved by a transaction are only locked if the transaction modifies data associated with one or more rows retrieved or if another transaction attempts to drop or alter the table the rows were retrieved from.

This isolation level is typically used for transactions that access read-only tables and views and for transactions that execute SELECT SQL statements for which uncommitted data from other transactions will have no adverse affect.

As the name implies, transactions running under the uncommitted read isolation level can see changes made to rows by other transactions before those changes have been committed. However, such transactions can neither see nor access tables, views, and indexes that are created by other transactions until those transactions themselves have been committed. The same applies to existing tables, views, or indexes that have been dropped; transactions using the uncommitted read will learn that these objects no longer exist only when the transaction that dropped them is committed. (It's important to note that when a transaction running under this isolation level uses an updatable cursor, the transaction will behave as if it is running under the Cursor Stability isolation level, and the constraints of the Cursor Stability isolation level will apply.)


Real World exampleHaven't you reserve the hotel yet?? ooh poor boy..

When a customer scans the database to obtain a list of available rooms for a given date range, you (the manager) will be able to change the room rates for any room in the hotel over any date range. Likewise, other customers will be able to make or cancel reservations for any room in the hotel, including the room at which the customer is currently looking. In addition, the list of rooms produced for the first customer may contain records for rooms for which other customers are in the processing of reserving or canceling reservations.

Choosing the Proper Isolation Level

So how do you decide which isolation level to use? The best way is to identify which types of phenomena are unacceptable, and then select an isolation level that will prevent those phenomena from occurring. A good rule of thumb is:
  • Use the Repeatable Read isolation level if you're executing large queries and you don't want concurrent transactions to have the ability to make changes that could cause the query to return different results if run more than once.
  • Use the Read Stability isolation level when you want some level of concurrency between applications, yet you also want qualified rows to remain stable for the duration of an individual transaction.
  • Use the Cursor Stability isolation level when you want maximum concurrency between applications, yet you don't want queries to see uncommitted data.
  • Use the Uncommitted Read isolation level if you're executing queries on read-only tables/views/databases or if it doesn't matter whether a query returns uncommitted data values.
Always keep in mind that choosing the wrong isolation level for a given situation can have a significant negative impact on both concurrency and performance—performance for some applications may be degraded as they wait for locks on resources to be released.

Specifying the Isolation Level to Use

Although isolation levels control concurrency at the transaction level, they are actually set at the application level. Therefore in most cases, the isolation level specified for a particular application is applicable to every transaction initiated by that application. (It is important to note that an application can be constructed in several different parts, and each part can be assigned a different isolation level, in which case the isolation level specified for a particular part is applicable to every transaction that is created within that part.)

For embedded SQL applications, the isolation level is specified at precompile time or when the application is bound to a database (if deferred binding is used). In this case, the isolation level is set using the ISOLATION [RR | RS | CS | UR] option of the PRECOMPILE and BIND commands.

DB2 Version 8.1 and later provides a WITH clause (WITH [RR | RS | CS | UR]) that can be appended to a SELECT statement to set a specific query's isolation level to Repeatable Read (RR), Read Stability (RS), Cursor Stability (CS), or Uncommitted Read (UR). A simple SELECT statement that uses this clause looks something like this:
SELECT * FROM employee WHERE empid = '001' WITH RR
 
I hope after reading this article you will no more have any problems on isolation level.

Understanding Locking in DB2 - IV

Read Part-I 

Read Part-II

Read Part-III

Lock duration

Lock Duration is a very important concept in locking when the lock states are not compatible with each other.This is the reason, you get those famous deadlock and timeout issues.So you should have sound knowledge of this concept before designing your program.

The duration of a lock is defined as the length of time a lock is held and varies according to the type of lock. Page or row locks are acquired when individual rows are accessed and are released at different times, depending on whether the data is accessed for read-only or for update and on the isolation level specified.High level locks are typically acquired implicitly, based on options specified at BIND or REBIND time.
Because DB2 does implicit high level locking, it is critical that the application designer understand the options that influence when high level locks are acquired, when they are released, and the state of the lock.

Duration of table space, partition, and table locks

Every plan executed in DB2 must acquire a high level lock that can either be a table space or partition lock and, if the table space is segmented, then a table lock as well. The exception this rule is that locks are not acquired on the table space or table if uncommitted read isolation is specified. However, DB2 does acquire a mass delete lock. A mass delete is when a DELETE statement is executed with no WHERE clause such that all rows are deleted. DB2 uses the mass delete lock to serialize a mass delete process with an uncommitted reader.

DB2 provides options to control the point at which the high level locks are requested.

The BIND and REBIND options ACQUIRE and RELEASE is used to control when the high level locks are taken and released.

The possible values for the ACQUIRE option are ALLOCATE and USE. If you use ACQUIRE(ALLOCATE), all of the needed locks on all of the table spaces, partitions, and tables used by the plan are acquired at the first SQL call. If you use ACQUIRE(USE), only the needed locks on the table spaces, partitions, and tables are acquired for a shorter time, as and when the objects are accessed.

DB2 packages and dynamic SQL always use ACQUIRE(USE).

The time when the high level locks are released is influenced by the RELEASE option. The possible values for this option are DEALLOCATE and COMMIT. If you use RELEASE(DEALLOCATE), the locks are released only when the thread terminates. If you use RELEASE(COMMIT), the locks are released at commit time unless there are held cursors.


Duration of page or row locks

The duration of page or row locks depends on whether they are acquired and released in a read-only or read and write environment.

Read-only

BIND and REBIND ISOLATION option controls when page or row locks are acquired and released in a read-only environment.

The duration and number of page or row locks depend on the nature of the process and the value for the ISOLATION option specified at BIND or REBIND. The most common ISOLATION option is cursor stability (CS), which is the default as of DB2 9 for z/OS. With cursor stability as the ISOLATION option, DB2 returns committed data and provides stability of data under updateable cursors. If a query is answered through a work file,

For example, a concurrent thread may update the data in the database that generated the data in the work
file.The repeatable read (RR) isolation retains page locks even though the process logic moves through several pages in the same table during a unit of work. If the pages are accessed again, repeatable read provides repeatability: the data cannot be changed by any other concurrent process. This is true for read stability isolation also.

Read and write

A process that uses a cursor adds another dimension to locking. Pages read acquire a U-lock, initially indicating update intent. But before a column can be updated or a row can be deleted, the U-lock on the page is changed to an X-lock and is released at commit.


Here we come to an end of this Understanding of locking series.I will explain Isolation level in detail in different post as this topic deserve it, due to its immense importance to the programmer.Then there are various other important concept in terms of locking as lock avoidance,timeout,deadlock etc. which will be covered in coming posts.