BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mirisage
Obsidian | Level 7

Hi SAS Forum,

I tried to understand what is going on in this dow loop (Soren is acknowledged) by applying it in a single account dataset which has multiple months.

Objective of the dow was to create a new variable which identifies the last date where the account changes to NPNA.

Answer:

Obs          Bank_number          Account_number     Last_NPNA_Start_Month

1              10                            4444444444            28FEB2011

This is correct because the “last date where the account changes to NPNA is 28FEB2011.

But how it happened.

Data set (see the order of the dates are mixed).

Data set_3;

informat Current_date date9.;

Input Bank_number 1-2   Current_date   Account_number $14-23  Balance 25-

26   product $28-40 Arrears_Band $42-48 ;

format Current_date date9.;

cards;

10 31DEC2010 4444444444 25 Personal Loan NPNA

10 31MAR2011 4444444444 18 Personal Loan NPNA

10 31JAN2011 4444444444 18 Personal Loan 1 – 30

10 31MAY2011 4444444444 85 Personal Loan Current

10 30APR2011 4444444444 18 Personal Loan NPNA

10 28FEB2011 4444444444 18 Personal Loan NPNA

10 28FEB2010 4444444444 25 Personal Loan NPNA

;

run;

I first sorted the data set by a combination of 3 variables that would make a unique ID.

proc sort data=set_3 out = c;

  by bank_number account_number  current_date;

run;

This is the sorted data set. We should use this to answer our question, I think,  because the  order of current_date in the unsorted data set is mixed.

Obs Current_date Bank_number Account_number Balance product Arrears_Band (variable headings)

1

28FEB2010

10

4444444444

25

Personal Loan

NPNA

2

31DEC2010

10

4444444444

25

Personal Loan

NPNA

3

31JAN2011

10

4444444444

18

Personal Loan

1 – 30

4

28FEB2011

10

4444444444

18

Personal Loan

NPNA

5

31MAR2011

10

4444444444

18

Personal Loan

NPNA

6

30APR2011

10

4444444444

18

Personal Loan

NPNA

7

31MAY2011

10

4444444444

85

Personal Loan

Current

Then the dow loop was applied.

data last_start_npna;

do until(last.account_number);

    set c ;

    by bank_number account_number arrears_band notsorted;

    if first.Arrears_band and Arrears_band='NPNA'

      then Last_NPNA_Start_Month = Current_Date;

       putlog "(1)"  _all_;

    end;

     format Last_NPNA_Start_Month date9.;

  keep bank_number account_number Last_NPNA_Start_month;

run;

proc print; run;

Output generated is this.

Obs          Bank_number             Account_number         Last_NPNA_Start_Month

1              10                             4444444444                    28FEB2011

This is correct because the “last date where the account changes to NPNA is 28FEB2011.

But how it happened.

I tried with introducing putlog "(1)"  _all_ which generated this log.

I intentionally numbered them as (1) to (7) to lead my question.

(1)last.account_number=0 Current_date=28FEB2010 Bank_number=10 Account_number=4444444444 Balance=25 product=Personal Loan

Arrears_Band=NPNA FIRST.Bank_number=1 LAST.Bank_number=0 FIRST.Account_number=1 FIRST.Arrears_Band=1 LAST.Arrears_Band=0

Last_NPNA_Start_Month=28FEB2010 _ERROR_=0 _N_=1

(2)last.account_number=0 Current_date=31DEC2010 Bank_number=10 Account_number=4444444444 Balance=25 product=Personal Loan

Arrears_Band=NPNA FIRST.Bank_number=0 LAST.Bank_number=0 FIRST.Account_number=0 FIRST.Arrears_Band=0 LAST.Arrears_Band=1

Last_NPNA_Start_Month=28FEB2010 _ERROR_=0 _N_=1

(3)last.account_number=0 Current_date=31JAN2011 Bank_number=10 Account_number=4444444444 Balance=18 product=Personal Loan

Arrears_Band=1 - 30 FIRST.Bank_number=0 LAST.Bank_number=0 FIRST.Account_number=0 FIRST.Arrears_Band=1 LAST.Arrears_Band=1

Last_NPNA_Start_Month=28FEB2010 _ERROR_=0 _N_=1

(4)last.account_number=0 Current_date=28FEB2011 Bank_number=10 Account_number=4444444444 Balance=18 product=Personal Loan

Arrears_Band=NPNA FIRST.Bank_number=0 LAST.Bank_number=0 FIRST.Account_number=0 FIRST.Arrears_Band=1 LAST.Arrears_Band=0

Last_NPNA_Start_Month=28FEB2011 _ERROR_=0 _N_=1

(5)last.account_number=0 Current_date=31MAR2011 Bank_number=10 Account_number=4444444444 Balance=18 product=Personal Loan

Arrears_Band=NPNA FIRST.Bank_number=0 LAST.Bank_number=0 FIRST.Account_number=0 FIRST.Arrears_Band=0 LAST.Arrears_Band=0

Last_NPNA_Start_Month=28FEB2011 _ERROR_=0 _N_=1

(6)last.account_number=0 Current_date=30APR2011 Bank_number=10 Account_number=4444444444 Balance=18 product=Personal Loan

Arrears_Band=NPNA FIRST.Bank_number=0 LAST.Bank_number=0 FIRST.Account_number=0 FIRST.Arrears_Band=0 LAST.Arrears_Band=1

Last_NPNA_Start_Month=28FEB2011 _ERROR_=0 _N_=1

(7)last.account_number=1 Current_date=31MAY2011 Bank_number=10 Account_number=4444444444 Balance=85 product=Personal Loan

Arrears_Band=Current FIRST.Bank_number=0 LAST.Bank_number=1 FIRST.Account_number=0 FIRST.Arrears_Band=1 LAST.Arrears_Band=1

Last_NPNA_Start_Month=28FEB2011 _ERROR_=0 _N_=1

Question:

See log (1)

When DOW hits first record, dow condition is satisfied.

Because first.Arrears_band = 1 and Arrears_band='NPNA'

So, the Last_NPNA_Start_Month= Current_date   which is 28FEB2010

This is correct and I understand it.

But see log (2)

When DoW hits the second record, it doesn’t satisfy the condition.

Because first.Arrears_band = 0 and Arrears_band='NPNA'

But still the log says Last_NPNA_Start_Month=28FEB2010  for the second record too.

How come?

see log (4) this is the turning point.

first.Arrears_band = 1 and Arrears_band='NPNA' (satisfy our condition).

So, the Last_NPNA_Start_Month= Current_date   which is 28FEB2011

This is correct and I understand it.

But in subsequent records, even when our condition is not satisfied log continues to say

Last_NPNA_Start_Month=28FEB2011

How come?

Last question: Now SAS has two dates that satisfy our condition, 28FEB2010 and 28FEB2011.

But how has SAS decided to output 28FEB2011 for us which is our correct answer.

Your help to understand this is greatly appreciated.

Mirisage

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Mirisage,

You don't quite understand how the DOW loop works.  What you appear to be missing is that the value assigned to the newly created variable is retained across all of the records.  Records 2, 3, 5, 6 and 7 don't meet your condition, thus the pdv will only reflect the value that had been assigned when either record 2 or record 4 was processed.

When record 4 is processed, the value of 28feb2010 is replaced with 28feb2011.  Since you include an output statement in the loop, only the last record triggers a record to be output.  In your case, it is record 7 along with the value of Last_NPNA_Start_Month that was assigned when record 4 was processed.

View solution in original post

5 REPLIES 5
art297
Opal | Level 21

Mirisage,

You don't quite understand how the DOW loop works.  What you appear to be missing is that the value assigned to the newly created variable is retained across all of the records.  Records 2, 3, 5, 6 and 7 don't meet your condition, thus the pdv will only reflect the value that had been assigned when either record 2 or record 4 was processed.

When record 4 is processed, the value of 28feb2010 is replaced with 28feb2011.  Since you include an output statement in the loop, only the last record triggers a record to be output.  In your case, it is record 7 along with the value of Last_NPNA_Start_Month that was assigned when record 4 was processed.

chang_y_chung_hotmail_com
Obsidian | Level 7

You don't have to use the DoW or any other programming patterns until you feel comfortable with the data step. Here is a non-DoW version.

ods _all_ close;
ods listing;
options nocenter;
 
/* test data */
data one;
  input bank date :date9. acc $ bal prod :& $13. band :& $7.;
  format date date10.;
cards;
10 31DEC2010 4444444444 25 Personal Loan  NPNA
10 31MAR2011 4444444444 18 Personal Loan  NPNA
10 31JAN2011 4444444444 18 Personal Loan  1 - 30
10 31MAY2011 4444444444 85 Personal Loan  Current
10 30APR2011 4444444444 18 Personal Loan  NPNA
10 28FEB2011 4444444444 18 Personal Loan  NPNA
10 28FEB2010 4444444444 25 Personal Loan  NPNA
;
run;
 
proc sort data=one out=two;
  by bank acc date;
run;
 
/* flag the last date where the account changes to npna */
data three;

  set two;
  by bank acc band notsorted;
 
  retain since .;
 
  if first.acc then call missing(since);
 
  if first.band and band = "NPNA" then since = date;
 
  if last.acc then output;
 
  keep bank acc since;
  format since date10.;
run;
 
proc print data=three;
run;
/* on lst
Obs    bank      acc            since
1      10     44444444     28FEB2011
*/

Mirisage
Obsidian | Level 7

Hi Art and Chang,

Many thanks to both of you.

Hi Art,

I think I am now much better than where I was originally. Thanks again for your nice explanation.

Hi Chang,

Thank you very much for your nice alternative way of doing the same thing,

Could I ask an additional question.

I always have this confusion.

1) In the follwing code, Yellow highlighted place is the first time SAS hits the variable called “since”.

We have never said SAS what is “since” and how it was derived such as since = bank + 8 or anything like that.

But SAS understands before hand what is since. How come?

2). Blue highlighted place is the second place SAS hits the variable called “since”.

Still SAS was not told by us what is “since” and how “since” was created.

But SAS never questions you asking “Hey first define what is “since” before you ask me to do anything involving the variable since”

If you have time, could you please elaborate a bit.

data three;

   set two;

  by bank acc band notsorted;

 

  retain since .;

   if first.acc then call missing(since);

   if first.band and band = "NPNA" then since = date;

   if last.acc then output;

   keep bank acc since;

  format since date10.;

run;

Thank you for your time and expertise.

Mirisage

Tom
Super User Tom
Super User

(NOTE: We are taking as given that the variable SINCE is NOT in the dataset TWO referenced in the second line of the data step.)

SAS will define the variable type for you based on how you use it.  If it cannot figure it out it will make it numeric.  It will also try to assign the length of a character variable for you.  Most of the time this will do what you want. This makes it much easier to write a program than in languages that require definition of everything.

In this case because you told it to initialize SINCE to numeric missing in the RETAIN statement in knows 100% that you meant for SINCE to be numeric.

If you do not include the initial value in the RETAIN statement then the data step compiler does not know whether SINCE is numeric or character.  So the next time it appears as an argument to the MISSING() function.  That does not help as the missing function can take both numeric and character arguments.  So then it sees it assigned the value of the DATE variable in the next line.  So SINCE would be given the same type and length as the DATE variable.

Mirisage
Obsidian | Level 7

Hi Tom,

Thank you very much for this clarification.

This is very helpful.

Regards

Mirisage

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1002 views
  • 7 likes
  • 4 in conversation