BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mirisage
Obsidian | Level 7

Hi SAS Community,

Thank every one of you for all supports.

I am asking a help again.

In this data set, I have 8 distinct accounts in several banks (colored for clarity). 

data have;

informat current_date date9.;

input Bank_number    Account_number  $ 5-7 Current_date   Arrears_Band $22-28;

format current_date date9.;

cards;

10  111    30SEP2010 NPNA

10  111    31OCT2010 Current

10  111    30NOV2010 NPNA

10  111    30JUN2011 NPNA

10  111    01JAN2012 writoff

70  111    31DEC2011 NPNA

70  111    31JAN2012 NPNA

40  333    30NOV2010 Current

40  333    31DEC2010 NPNA

50  333    31JAN2011 NPNA

50  333    28FEB2011 writoff

20  333    31MAR2011 NPNA

20  333    30APR2011 NPNA

20  333    31MAY2011 NPNA

20  333    31AUG2011 NPNA

100 111    30APR2011 90 +

100 111    31MAY2011 90 +

100 111    31JUL2011 NPNA

100 111    31AUG2011 NPNA

100 111    04JAN2012 NPNA

80  111    31DEC2011 NPNA

40  555    08FEB2010 30 - 60

40  555    31MAR2010 Current

40  555    30APR2010 60 - 90

40  555    31MAY2010 Current

40  666    03JAN2012 1 - 30

;

run;

Question 1:

First I need to identify what are the accounts whose arrears_band = ‘NPNA’ throughout their life.

/*Answer to question 1*/

Lets call this table as Table_for_Q1

70  111    31DEC2011 NPNA

70  111    31JAN2012 NPNA

20  333    31MAR2011 NPNA

20  333    30APR2011 NPNA

20  333    31MAY2011 NPNA

20  333    31AUG2011 NPNA

80  111    31DEC2011 NPNA

Question 2:

Then I need to get a table having removed all of those account_numbers identified under Question 1 from original "have" table.

/*Answer to question 2*/

Lets call this table as Table_for_Q2

10  111    30SEP2010 NPNA

10  111    31OCT2010 Current

10  111    30NOV2010 NPNA

10  111    30JUN2011 NPNA

10  111    01JAN2012 writoff

40  333    30NOV2010 Current

40  333    31DEC2010 NPNA

50  333    31JAN2011 NPNA

50  333    28FEB2011 writoff

100 111    30APR2011 90 +

100 111    31MAY2011 90 +

100 111    31JUL2011 NPNA

100 111    31AUG2011 NPNA

100 111    04JAN2012 NPNA

40  555    08FEB2010 30 - 60

40  555    31MAR2010 Current

40  555    30APR2010 60 - 90

40  555    31MAY2010 Current

40  666    03JAN2012 1 - 30

What I have tried

/*Sorting by 3 variables that makes a unique record*/

proc sort data=have out=have_sorted;

  by bank_number account_number current_date;

run;

/*flagging*/

data want;

set have_sorted;

  if arrears_band ='NPNA' then flag=1;

else flag=0;

run;

Actually I cannot imagine where to go from here……

Just I only know I can do the following merging step to get the answer to question 2 which is Table_for_Q2, if I can find Table_for_Q1.

/*Merging*/

Proc sort data = have;

   by bank_number account_number current_date;

run;

Proc sort data = Table_for_Q1;

   by bank_number account_number current_date;

run;

Data  Table_for_Q2;

    Merge       have                                (in=a)

                   Table_for_Q1    (in=b)

;

By bank_number account_number current_date;

If a and not Table_for_Q1

;

Run;

I would really appreciate if anyone of you could help me.

Thanks

Mirisage

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Seems to me it can be done using straightforward SQL:

data have;

informat current_date date9.;

input Bank_number Account_number $ 5-7 Current_date Arrears_Band $22-28;

format current_date date9.;

cards;

10 111 30SEP2010 NPNA

10 111 31OCT2010 Current

10 111 30NOV2010 NPNA

10 111 30JUN2011 NPNA

10 111 01JAN2012 writoff

70 111 31DEC2011 NPNA

70 111 31JAN2012 NPNA

40 333 30NOV2010 Current

40 333 31DEC2010 NPNA

50 333 31JAN2011 NPNA

50 333 28FEB2011 writoff

20 333 31MAR2011 NPNA

20 333 30APR2011 NPNA

20 333 31MAY2011 NPNA

20 333 31AUG2011 NPNA

100 111 30APR2011 90 +

100 111 31MAY2011 90 +

100 111 31JUL2011 NPNA

100 111 31AUG2011 NPNA

100 111 04JAN2012 NPNA

80 111 31DEC2011 NPNA

40 555 08FEB2010 30 - 60

40 555 31MAR2010 Current

40 555 30APR2010 60 - 90

40 555 31MAY2010 Current

40 666 03JAN2012 1 - 30

;

/*Q1*/

proc sql;

  select * from have group by Bank_number having sum(Arrears_Band ne 'NPNA')=0;quit;

  /*Q2*/

  proc sql;

  select * from have

  except

  (select * from have group by Bank_number having sum(Arrears_Band ne 'NPNA')=0)

  order by Bank_number;

  quit;

Haikuo

View solution in original post

5 REPLIES 5
Haikuo
Onyx | Level 15

Seems to me it can be done using straightforward SQL:

data have;

informat current_date date9.;

input Bank_number Account_number $ 5-7 Current_date Arrears_Band $22-28;

format current_date date9.;

cards;

10 111 30SEP2010 NPNA

10 111 31OCT2010 Current

10 111 30NOV2010 NPNA

10 111 30JUN2011 NPNA

10 111 01JAN2012 writoff

70 111 31DEC2011 NPNA

70 111 31JAN2012 NPNA

40 333 30NOV2010 Current

40 333 31DEC2010 NPNA

50 333 31JAN2011 NPNA

50 333 28FEB2011 writoff

20 333 31MAR2011 NPNA

20 333 30APR2011 NPNA

20 333 31MAY2011 NPNA

20 333 31AUG2011 NPNA

100 111 30APR2011 90 +

100 111 31MAY2011 90 +

100 111 31JUL2011 NPNA

100 111 31AUG2011 NPNA

100 111 04JAN2012 NPNA

80 111 31DEC2011 NPNA

40 555 08FEB2010 30 - 60

40 555 31MAR2010 Current

40 555 30APR2010 60 - 90

40 555 31MAY2010 Current

40 666 03JAN2012 1 - 30

;

/*Q1*/

proc sql;

  select * from have group by Bank_number having sum(Arrears_Band ne 'NPNA')=0;quit;

  /*Q2*/

  proc sql;

  select * from have

  except

  (select * from have group by Bank_number having sum(Arrears_Band ne 'NPNA')=0)

  order by Bank_number;

  quit;

Haikuo

Linlin
Lapis Lazuli | Level 10

Haikuo,

How are you doing? I like your code. I added a group variable "account_number" to your code.

/*Q1*/

proc sql;

  create table q1 as

          select * from have group by Bank_number,account_number

             having sum(Arrears_Band ne 'NPNA')=0;quit;

  /*Q2*/

   proc sql;

  create table q2 as

  select * from have

  except

  (select * from have group by Bank_number,account_number having sum(Arrears_Band ne 'NPNA')=0)

  order by Bank_number;

  quit;

Mirisage
Obsidian | Level 7

Hi Haikuo and Linlin,

I have applied the two codes to the follwing test data set named 'one' but it is strange that follwoing 3 accoutns are not identified as continuing NPNA accounts.

/*

Is there any reason?

10 28Feb2010 2222 9476 Personal OD   NPNA

10 31Mar2010 2222 9635 Personal OD   NPNA

10 30Apr2010 2222 9791 Personal OD   NPNA*/

data one;

informat Current_date date9.;

input Bank_number   Current_date    Account_number $ 14 -17    Balance   Product     $ 24 -36 Arrears_Band $ 38-44;

format Current_date date9.;

cards;

10 28Feb2010 1111 2216 Personal Loan NPNA

10 31Mar2010 1111 2216 Personal Loan NPNA

10 30Apr2010 1111 2216 Personal Loan NPNA

10 31May2010 1111 2216 Personal Loan NPNA

10 30Jun2010 1111 2216 Personal Loan NPNA

10 06Jul2010 1111 2216 Personal Loan writoff

10 28Feb2010 2222 9476 Personal OD   NPNA

10 31Mar2010 2222 9635 Personal OD   NPNA

10 30Apr2010 2222 9791 Personal OD   NPNA

30 28Feb2010 1111 3420 Res. Mortgage NPNA

30 31Mar2010 1111 3308 Res. Mortgage NPNA

30 30Jun2011 1111 2614 Res. Mortgage NPNA

30 31Jul2011 1111 2599 Res. Mortgage 1 - 30

30 31Aug2011 1111 2599 Res. Mortgage Current

30 30Sep2011 1111 2599 Res. Mortgage Current

30 31Oct2011 1111 2599 Res. Mortgage 1 - 30

30 30Nov2011 1111 2591 Res. Mortgage Current

30 31Dec2011 1111 2584 Res. Mortgage Current

30 31Jan2012 1111 2584 Res. Mortgage Current

40 28Feb2010 1111 1492 Personal Loan NPNA

40 31Mar2010 1111 1487 Personal Loan NPNA

40 31Aug2010 1111 1425 Personal Loan NPNA

40 30Sep2010 1111 7226 Personal Loan writoff

40 28Feb2011 1111 3510 Personal Loan NPNA

40 31Mar2011 1111 3510 Personal Loan NPNA

40 30Apr2011 1111 3510 Personal Loan NPNA

40 31May2011 1111 3510 Personal Loan NPNA

;

run;

/*identification of continuing NPNA accounts*/

proc sql;

  create table q1 as

          select * from one group by Bank_number,account_number

             having sum(Arrears_Band ne 'NPNA')=0;quit;

/*Removing of continuing NPNA accounts*/

proc sql;

  create table temp_1 as

  select * from one

  except

  (select * from one group by Bank_number,account_number having sum(Arrears_Band ne 'NPNA')=0)

  order by Bank_number;

quit;

Thanks

Mirisage

Linlin
Lapis Lazuli | Level 10

Hi Mirisage,

I ran the code. Below are the output and log file:

           Current_     Bank_    Account_                              Arrears_

   Obs         date    number     number     Balance      Product        Band

    1     28FEB2010      10        2222        9476     Personal OD      NPNA

    2     30APR2010      10        2222        9791     Personal OD      NPNA

    3     31MAR2010      10        2222        9635     Personal OD      NPNA

1    data one;

2    informat Current_date date9.;

3    input Bank_number   Current_date    Account_number $ 14 -17    Balance

3  ! Product     $ 24 -36 Arrears_Band $ 38-44;

4    format Current_date date9.;

5    cards;

NOTE: The data set WORK.ONE has 27 observations and 6 variables.

NOTE: DATA statement used (Total process time):

      real time           0.02 seconds

      cpu time            0.01 seconds

33   ;

34   run;

35

36   /*identification of continuing NPNA accounts*/

37   proc sql;

38     create table q1 as

39             select * from one group by Bank_number,account_number

40                having sum(Arrears_Band ne 'NPNA')=0;

NOTE: The query requires remerging summary statistics back with the original data.

NOTE: Table WORK.Q1 created, with 3 rows and 6 columns.

40 !                                                   quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           0.04 seconds

      cpu time            0.04 seconds

41

42   proc print data=q1;run;

NOTE: There were 3 observations read from the data set WORK.Q1.

NOTE: PROCEDURE PRINT used (Total process time):

      real time           0.19 seconds

      cpu time            0.04 seconds

Mirisage
Obsidian | Level 7

Hi Linlin and Haikuo,

My bad. I apologize. Just after you (linlin) posted your results of running the code on my new data set, I have tried it. It worked like anything.

I really do not know what happened to me in my previous occasion. 

Many thanks to both of you for this great code help.

Regards

Mirisage

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 831 views
  • 4 likes
  • 3 in conversation