Help using Base SAS procedures

Longitudinal data question using proc SQL

Accepted Solution Solved
Reply
Super Contributor
Posts: 338
Accepted Solution

Longitudinal data question using proc SQL

Hi Colleagues,

Attached longitudinal data set shows how 5 bank accounts are swinging from

one delinquency bucket to the other  (original data set has over million

records).

SQL code provided at the very end of this returns the sum of variable 'balance' for each arrears band

grouped by Current_date subject to this condition.


  • Balances corresponding to NPNA Arrears_Band have to be summed up for

only those accounts that have fallen into NPNA status after

28FEB2010. NPNA balances in 28FEB2010 and in all subsequent months of

those accounts that were already in NPNA status

in 28FEB2010 should be omitted from summing up.


E.g. Account 1111111111's

NPNA balance should be excluded from summing up

in 28FEB2010 and in all subsequent months because it was already in NPNA

status when we begin our modeling period in 28FEB2010.

QUESTION:

I need to incorporate another condition with regard to NPNA arrers bands.

E.g. A/C 4444444444 has fallen into delq. bucket permanenelty in

28FEB2011. So, its NPNA balances in all subsequent months (in 31MAR2011

and beyond) should be omitted from summing up. Note that A/C 4444444444 has fallen

into NPNA in 31DEC2010 too, but it was not permanenet, so 31DEC2010

NPNA balance should not be excluded.

Likewise, in A/C 5555555555 NPNA balances in 31MAR2011 and beyond

should be omitted from summing up.

Final answer should be like this.

Current_date      current      1-30      30-60      60-90      90+     NPNA

28FEB2010             24           0           0           0          0           .             

31MAR2010            25            23         0            0          0           .    

30APR2010             25            0          23          0          0           .

31MAY2010            25            0            0          23        0            .

30JUN2010               0            0            0           0        23           .

31JUL2010               0            0            0           0          0         85

30NOV2010             25          0            0           0           0           .

31DEC2010              0            0           0           0           0            25

31JAN2011              25            18         0           0           0              .

28FEB2011             0              0            0          0          0            43

31MAR2011            0               0           0          0          0            .

30APR2011             0               0          0           0           0           .

31MAY2011            85             0          0            0            0           .

Could anyone help me to modify this SQL code?

Thanks, Mirisage

proc sql;

create table LOSS1 as

select a1.*,a2.npna

from(

select

Current_date      ,

sum(case when Arrears_Band IN ('Current') then balance else 0 end )           as current,

sum(case when Arrears_Band IN ('1 - 30') then balance else 0 end )           as One_to_30,

sum(case when Arrears_Band IN ('30 - 60') then balance else 0 end )           as Thirty_to_60,

sum(case when Arrears_Band IN ('60 - 90') then balance else 0 end )           as sixty_to_90,

sum(case when Arrears_Band IN ('90 +') then balance else 0 end)               as Ninety_plus

from b

where  Product IN ('Personal OD','Personal Loan','Res. Mortgage') and

Bank_number = 10

group by Current_date, Bank_number

) a1

left join

(

select

Current_date   ,

sum(balance)as npna

from b

where  Product IN ('Personal OD','Personal Loan','Res. Mortgage') and

arrears_Band = "NPNA" and

Bank_number = 10 and Account_number not in (select Account_number from b

where

current_date =mdy(02,28,2010) and arrears_Band = "NPNA")

group by Current_date, Bank_number) a2

on

a1.Current_date = a2.Current_date;

quit;

Attachment

Accepted Solutions
Solution
‎06-15-2012 01:36 AM
Frequent Contributor
Posts: 95

Re: Longitudinal data question using proc SQL

Does it have to be SQL code? Using hash object along with a hash iterator in a data step can be an alternative way to address this issue.

Read records by Account_Number and check out whether Arrears_Band = 'NPNA' or not.

If so and the value of NPNA_Start_Month is missing then set it equal to Current_Date. Else, set NPNA_Start_Month value to missing.

Load the record into a hash object. After the last record for given account_number read traverse the records for it using hash iterator object.

Inıtıalize your sum varaiblles

        Current = 0;

        One_to_30 = 0;

        Thirty_to_60 = 0;

        Sixty_to_90 = 0;

        Ninety_plus = 0;

        NPNA = .;

Update the value of your sum variables based on the value of Arrears_Band and output the record. After all records read clear the hash object and start over for the next account_number.

After all records for all account_numbers have been read summarize your temp data set.

data b;

informat Current_date date9.;

Input Bank_number 1-2   Current_date   Account_number $14-23  Balance 25-

26   product $28-40 Arrears_Band $42-48 ;

format Current_date date9.;

cards;

10 28FEB2010 1111111111 45 Personal Loan NPNA

10 31MAR2010 1111111111 45 Personal Loan NPNA

10 30APR2010 1111111111 45 Personal Loan NPNA

10 31MAY2010 1111111111 45 Personal Loan NPNA

10 31MAR2010 2222222222 25 Personal Loan Current

10 30APR2010 2222222222 25 Personal Loan Current

10 31MAY2010 2222222222 25 Personal Loan Current

10 28FEB2010 3333333333 24 Personal Loan Current

10 31MAR2010 3333333333 23 Personal Loan 1 - 30

10 30APR2010 3333333333 23 Personal Loan 30 - 60

10 31MAY2010 3333333333 23 Personal Loan 60 - 90

10 30JUN2010 3333333333 23 Personal Loan 90 +

10 31JUL2010 3333333333 85 Personal Loan NPNA

10 30NOV2010 4444444444 25 Personal Loan Current

10 31DEC2010 4444444444 25 Personal Loan NPNA

10 31JAN2011 4444444444 18 Personal Loan 1 - 30

10 28FEB2011 4444444444 18 Personal Loan NPNA

10 31MAR2011 4444444444 18 Personal Loan NPNA

10 30APR2011 4444444444 18 Personal Loan NPNA

10 31MAY2011 4444444444 85 Personal Loan Current

10 31JAN2011 5555555555 25 Personal Loan Current

10 28FEB2011 5555555555 25 Personal Loan NPNA

10 31MAR2011 5555555555 22 Personal Loan NPNA

10 30APR2011 5555555555 21 Personal Loan NPNA

;

run;

data temp;

    if _n_ = 1 then do;

        declare hash h(ordered:'a');

        h.defineKey('Current_Date');

        h.defineData('Current_Date','Bank_Number','Account_Number','Balance','Product','Arrears_Band','NPNA_Start_Month');

        h.defineDone();

        declare hiter hi('h');

    end;

    NPNA_Start_Month = .;

    do until(last.Account_Number);

        set b;

        where  Product IN ('Personal OD','Personal Loan','Res. Mortgage') and

        Bank_number = 10;

        by Account_Number;

        if Arrears_Band = 'NPNA' and NPNA_Start_Month = . then NPNA_Start_Month = Current_Date;

        else if Arrears_Band ne 'NPNA' then NPNA_Start_Month = .;

        h.replace();

    end;

    rc = hi.first();

    do while(rc=0);

        Current = 0;

        One_to_30 = 0;

        Thirty_to_60 = 0;

        Sixty_to_90 = 0;

        Ninety_plus = 0;

        NPNA = .;

        if NPNA_Start_Month = . or (

                NPNA_Start_Month = Current_Date and NPNA_Start_Month > '28Feb2010'd) then do;

            select (Arrears_Band);

                when ('Current') do;Current = Balance;end;

                when ('1 - 30') do;One_to_30 = Balance;end;

                when ('30 - 60') do;Thirty_to_60 = Balance;end;

                when ('60 - 90') do;Sixty_to_90 = Balance;end;

                when ('90 +') do;Ninety_plus = Balance;end;

                when ('NPNA') do;NPNA = Balance;end;

            end;

        end;

        output;

        rc = hi.next();

    end;

    h.clear();

run;

proc summary data=temp nway missing;

    class Current_Date;

    var Current One_to_30 Thirty_to_60 Sixty_to_90 Ninety_plus NPNA;

    output out=want(drop=_Smiley Happy

    sum=;

run;

View solution in original post


All Replies
Solution
‎06-15-2012 01:36 AM
Frequent Contributor
Posts: 95

Re: Longitudinal data question using proc SQL

Does it have to be SQL code? Using hash object along with a hash iterator in a data step can be an alternative way to address this issue.

Read records by Account_Number and check out whether Arrears_Band = 'NPNA' or not.

If so and the value of NPNA_Start_Month is missing then set it equal to Current_Date. Else, set NPNA_Start_Month value to missing.

Load the record into a hash object. After the last record for given account_number read traverse the records for it using hash iterator object.

Inıtıalize your sum varaiblles

        Current = 0;

        One_to_30 = 0;

        Thirty_to_60 = 0;

        Sixty_to_90 = 0;

        Ninety_plus = 0;

        NPNA = .;

Update the value of your sum variables based on the value of Arrears_Band and output the record. After all records read clear the hash object and start over for the next account_number.

After all records for all account_numbers have been read summarize your temp data set.

data b;

informat Current_date date9.;

Input Bank_number 1-2   Current_date   Account_number $14-23  Balance 25-

26   product $28-40 Arrears_Band $42-48 ;

format Current_date date9.;

cards;

10 28FEB2010 1111111111 45 Personal Loan NPNA

10 31MAR2010 1111111111 45 Personal Loan NPNA

10 30APR2010 1111111111 45 Personal Loan NPNA

10 31MAY2010 1111111111 45 Personal Loan NPNA

10 31MAR2010 2222222222 25 Personal Loan Current

10 30APR2010 2222222222 25 Personal Loan Current

10 31MAY2010 2222222222 25 Personal Loan Current

10 28FEB2010 3333333333 24 Personal Loan Current

10 31MAR2010 3333333333 23 Personal Loan 1 - 30

10 30APR2010 3333333333 23 Personal Loan 30 - 60

10 31MAY2010 3333333333 23 Personal Loan 60 - 90

10 30JUN2010 3333333333 23 Personal Loan 90 +

10 31JUL2010 3333333333 85 Personal Loan NPNA

10 30NOV2010 4444444444 25 Personal Loan Current

10 31DEC2010 4444444444 25 Personal Loan NPNA

10 31JAN2011 4444444444 18 Personal Loan 1 - 30

10 28FEB2011 4444444444 18 Personal Loan NPNA

10 31MAR2011 4444444444 18 Personal Loan NPNA

10 30APR2011 4444444444 18 Personal Loan NPNA

10 31MAY2011 4444444444 85 Personal Loan Current

10 31JAN2011 5555555555 25 Personal Loan Current

10 28FEB2011 5555555555 25 Personal Loan NPNA

10 31MAR2011 5555555555 22 Personal Loan NPNA

10 30APR2011 5555555555 21 Personal Loan NPNA

;

run;

data temp;

    if _n_ = 1 then do;

        declare hash h(ordered:'a');

        h.defineKey('Current_Date');

        h.defineData('Current_Date','Bank_Number','Account_Number','Balance','Product','Arrears_Band','NPNA_Start_Month');

        h.defineDone();

        declare hiter hi('h');

    end;

    NPNA_Start_Month = .;

    do until(last.Account_Number);

        set b;

        where  Product IN ('Personal OD','Personal Loan','Res. Mortgage') and

        Bank_number = 10;

        by Account_Number;

        if Arrears_Band = 'NPNA' and NPNA_Start_Month = . then NPNA_Start_Month = Current_Date;

        else if Arrears_Band ne 'NPNA' then NPNA_Start_Month = .;

        h.replace();

    end;

    rc = hi.first();

    do while(rc=0);

        Current = 0;

        One_to_30 = 0;

        Thirty_to_60 = 0;

        Sixty_to_90 = 0;

        Ninety_plus = 0;

        NPNA = .;

        if NPNA_Start_Month = . or (

                NPNA_Start_Month = Current_Date and NPNA_Start_Month > '28Feb2010'd) then do;

            select (Arrears_Band);

                when ('Current') do;Current = Balance;end;

                when ('1 - 30') do;One_to_30 = Balance;end;

                when ('30 - 60') do;Thirty_to_60 = Balance;end;

                when ('60 - 90') do;Sixty_to_90 = Balance;end;

                when ('90 +') do;Ninety_plus = Balance;end;

                when ('NPNA') do;NPNA = Balance;end;

            end;

        end;

        output;

        rc = hi.next();

    end;

    h.clear();

run;

proc summary data=temp nway missing;

    class Current_Date;

    var Current One_to_30 Thirty_to_60 Sixty_to_90 Ninety_plus NPNA;

    output out=want(drop=_Smiley Happy

    sum=;

run;

Super Contributor
Posts: 338

Re: Longitudinal data question using proc SQL

Hi Alpay,

This is splendid! It worked very well and many thanks.

However, I have found several exceptional accounts in my data set which is not captured by the hash code. One such is reproduced below. 

data b;
informat Current_date date9.;
Input Bank_number 1-2   Current_date   Account_number $14-23 
Balance 25-31   product $33-45 Arrears_Band $47-53 ;
format Current_date date9.;
cards;
10 28FEB2010 1111111111 6999.99 Personal Loan NPNA
10 31MAR2010 1111111111 6999.99 Personal Loan NPNA
10 30APR2010 1111111111 6999.99 Personal Loan NPNA
10 31MAY2010 1111111111 6999.99 Personal Loan NPNA
10 30JUN2010 1111111111 4444.44 Personal Loan Current
10 31JUL2010 1111111111 4444.44 Personal Loan 30 - 60
10 31AUG2010 1111111111 4444.44 Personal Loan 30 - 60
10 30SEP2010 1111111111 4444.44 Personal Loan 60 - 90
10 31OCT2010 1111111111 4444.44 Personal Loan NPNA
10 30NOV2010 1111111111 4444.44 Personal Loan NPNA
10 31DEC2010 1111111111 4444.44 Personal Loan NPNA
10 31JAN2011 1111111111 3333.21 Personal Loan NPNA
10 28FEB2011 1111111111 2222.22 Personal Loan Current
10 31MAR2011 1111111111 2222.22 Personal Loan 1 - 30
10 30APR2011 1111111111 2222.22 Personal Loan 30 - 60
10 31MAY2011 1111111111 2222.22 Personal Loan 1 - 30
10 30JUN2011 1111111111 2222.22 Personal Loan Current
10 31JUL2011 1111111111 2222.22 Personal Loan 30 - 60
10 31AUG2011 1111111111 2222.22 Personal Loan 60 - 90
10 30SEP2011 1111111111 7777.77 Personal Loan NPNA
10 31OCT2011 1111111111 7777.77 Personal Loan NPNA
10 30NOV2011 1111111111 7777.77 Personal Loan NPNA
10 31DEC2011 1111111111 7777.77 Personal Loan NPNA
;
run;

When I apply the hash object code for this account, the answer is like below. However, for 28FEB2010, NPNA
should not be . (i.e. should not be "period"), it should be  6999.99. All others are correct.

Current_date  current      1-30      30-60      60-90      90+     NPNA

28FEB2010        0             0          0            0            0          .
31MAR2010       0             0           0             0           0        .
30APR2010       0             0          0               0          0           .
31MAY2010       0             0          0            0             0        .
30JUN2010    4444.44        0          0             0            0        .
31JUL2010       0           0          4444.44      0          0           .
31AUG2010      0          0          4444.44      0            0         .
30SEP2010       0         0            0           4444.44     0         .
31OCT2010       0         0            0              0           0     4444.44
30NOV2010      0         0            0              0            0         .
31DEC2010       0        0            0              0            0         .
31JAN2011        0        0            0              0            0         .
28FEB2011     2222.22      0      0                0           0           .
31MAR2011      0      2222.22      0           0                0           .
30APR2011      0           0      2222.22           0           0           .
31MAY2011      0      2222.22      0                0           0           .
30JUN2011      2222.22      0      0                0           0           .
31JUL2011      0           0           2222.22      0           0           .
31AUG2011      0           0           0           2222.22      0           .
30SEP2011      0           0           0           0                0      7777.77
31OCT2011      0           0           0           0                0           .
30NOV2011      0           0           0           0                0           .
31DEC2011      0           0           0           0                0           .

The hash code omitted the calculations for first four records because we have incorporated the following condition.


Condition 1:
"Balances corresponding to NPNA Arrears_Band have to be summed up for
only those accounts that have fallen into NPNA status after
28FEB2010. NPNA balances in 28FEB2010 and in all subsequent months of
those accounts that were already in NPNA status
in 28FEB2010 should be omitted from summing up".

It is true this special account has started with "NPNA" status from 28FEB2010 but it has not continued its NPNA
status throughout. So, these type of accounts should be excluded from the condition 1 above although
they have started with "NPNA" status from 28FEB2010.

Question: I wonder if there is a possiblity to incorporate  this business reality into hash code.

Thanks for your time
Mirisage

Frequent Contributor
Posts: 95

Re: Longitudinal data question using proc SQL

Hi Minisage,

We can capture the last NPNA_Start_Month for a given account and check whether it is equal to 28Feb2010 or not instead of NPNA_Start_Month greater than 28Feb2010.

This way, we can capture accounts gone into NPNA status on 28Feb2010 and come out of NPNA status in subsequent months.

data b;

informat Current_date date9.;

Input Bank_number 1-2   Current_date   Account_number $14-23

Balance 25-31   product $33-45 Arrears_Band $47-53 ;

format Current_date date9.;

cards;

10 28FEB2010 1111111111 6999.99 Personal Loan NPNA

10 31MAR2010 1111111111 6999.99 Personal Loan NPNA

10 30APR2010 1111111111 6999.99 Personal Loan NPNA

10 31MAY2010 1111111111 6999.99 Personal Loan NPNA

10 30JUN2010 1111111111 4444.44 Personal Loan Current

10 31JUL2010 1111111111 4444.44 Personal Loan 30 - 60

10 31AUG2010 1111111111 4444.44 Personal Loan 30 - 60

10 30SEP2010 1111111111 4444.44 Personal Loan 60 - 90

10 31OCT2010 1111111111 4444.44 Personal Loan NPNA

10 30NOV2010 1111111111 4444.44 Personal Loan NPNA

10 31DEC2010 1111111111 4444.44 Personal Loan NPNA

10 31JAN2011 1111111111 3333.21 Personal Loan NPNA

10 28FEB2011 1111111111 2222.22 Personal Loan Current

10 31MAR2011 1111111111 2222.22 Personal Loan 1 - 30

10 30APR2011 1111111111 2222.22 Personal Loan 30 - 60

10 31MAY2011 1111111111 2222.22 Personal Loan 1 - 30

10 30JUN2011 1111111111 2222.22 Personal Loan Current

10 31JUL2011 1111111111 2222.22 Personal Loan 30 - 60

10 31AUG2011 1111111111 2222.22 Personal Loan 60 - 90

10 30SEP2011 1111111111 7777.77 Personal Loan NPNA

10 31OCT2011 1111111111 7777.77 Personal Loan NPNA

10 30NOV2011 1111111111 7777.77 Personal Loan NPNA

10 31DEC2011 1111111111 7777.77 Personal Loan NPNA

;

run;

data temp;

    if _n_ = 1 then do;

        declare hash h(ordered:'a');

        h.defineKey('Current_Date');

        h.defineData('Current_Date','Bank_Number','Account_Number','Balance','Product','Arrears_Band','NPNA_Start_Month');

        h.defineDone();

        declare hiter hi('h');

    end;

    NPNA_Start_Month = .;

    do until(last.Account_Number);

        set b;

        where  Product IN ('Personal OD','Personal Loan','Res. Mortgage') and

        Bank_number = 10;

        by Account_Number;

        if Arrears_Band = 'NPNA' and NPNA_Start_Month = . then NPNA_Start_Month = Current_Date;

        else if Arrears_Band ne 'NPNA' then NPNA_Start_Month = .;

        h.replace();

    end;

    Last_NPNA_Start_Month = NPNA_Start_Month; /* Added 6/18/2012 */

    rc = hi.first();

    do while(rc=0);

        Current = 0;

        One_to_30 = 0;

        Thirty_to_60 = 0;

        Sixty_to_90 = 0;

        Ninety_plus = 0;

        NPNA = .;

        if NPNA_Start_Month = . or (

                NPNA_Start_Month = Current_Date and /* NPNA_Start_Month > '28Feb2010'd */

                Last_NPNA_Start_Month ne '28Feb2010'd) then do;

            select (Arrears_Band);

                when ('Current') do;Current = Balance;end;

                when ('1 - 30') do;One_to_30 = Balance;end;

                when ('30 - 60') do;Thirty_to_60 = Balance;end;

                when ('60 - 90') do;Sixty_to_90 = Balance;end;

                when ('90 +') do;Ninety_plus = Balance;end;

                when ('NPNA') do;NPNA = Balance;end;

            end;

        end;

        output;

        rc = hi.next();

    end;

    h.clear();

run;

proc summary data=temp nway missing;

    class Current_Date;

    var Current One_to_30 Thirty_to_60 Sixty_to_90 Ninety_plus NPNA;

    output out=want(drop=_Smiley Happy

    sum=;

run;

Frequent Contributor
Posts: 95

Re: Longitudinal data question using proc SQL

Hi Mirisage,

I realized I mixed up your name in previous reply. Sorry about that.

Super Contributor
Posts: 338

Re: Longitudinal data question using proc SQL

Hi Alpey,

Your revised code is great!

It worked very well. Many thanks indeed.

I should try to digest hash object approach which I have never been exposed before.

As it will take sometime, I might ask your help to incorporate 1 or 2 other pressing business needs around the same problem as I am moving forward, in next few days, and please help me in them too.

Thanks again for your help!

Mirisage


🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 398 views
  • 0 likes
  • 2 in conversation