DATA Step, Macro, Functions and more

Creating Variable based on reading data across the row.

Accepted Solution Solved
Reply
Contributor
Posts: 21
Accepted Solution

Creating Variable based on reading data across the row.

I currently have a data set that has columns with different dates for when actions have occured.  I need to create a column with the stauts based on which column is the last to have a date.

 

Example Data:

 

OBS     Account #     open_dt       act1_dt       act2_dt      act3_dt       act4_dt    close_dt   

      1          123456           3/1/16

      2           23456           3/1/16        3/3/16

      3           34567           3/13/16     3/15/16      3/17/16       3/18/16                        3/20/16

 

I would like to create a column called status that sets a status based on which one is filled:

 

output would be all the data from above with one extra column for Status

 

OBS    ......    status

1                     open

2                      act1

3                      close

 

In explaning the logic it would go:

 

If open_dt not null then status = 'open'

if act1_dt not null then status = 'act1'

if act2_dt not null then status = 'act2'

if act3_dt not null then status = 'act3'

if act4_dt not null then status = 'act4'

if close_dt_dt not null then status = 'closed'

 

I feel like it would work like the above where it goes through them in order and just changes the status as it finds the next action.

 

I am just not sure how to actual right this in SAS


Accepted Solutions
Solution
‎04-10-2016 10:16 PM
Super User
Posts: 11,343

Re: Creating Variable based on reading data across the row.

Posted in reply to Sotarkadin

This works partially because the values you want are in the names of the variables and are separated by the _;

The main part is the array and looping through it backwards. Note the order of variables on the ARRAY statement is important.

The LEAVE statement says to stop looping when the first non-missing value is found. If there are no values for any of the _dt variables then status will be blank.

 

data have;
   informat account $6.;
   informat   open_dt act1_dt act2_dt act3_dt act4_dt close_dt mmddyy8.;
   format   open_dt act1_dt act2_dt act3_dt act4_dt close_dt mmddyy10.;
   input  Account open_dt act1_dt act2_dt act3_dt act4_dt close_dt;   
datalines;
123456  3/1/16    .        .          .         .     .
 23456  3/1/16    3/3/16   .          .         .     .
 34567  3/13/16   3/15/16   3/17/16    3/18/16  .     3/20/16
;
run;

data want;
   set have;
   length status $ 6;
   array dt open_dt act1_dt act2_dt act3_dt act4_dt close_dt;
   do i = dim(dt) to 1 by (-1);
      if not missing( dt[i] ) then do;
         status = scan(vname(dt[i]),1,'_');
         leave;
      end;
   end;
   drop i;
run;
 

View solution in original post


All Replies
Solution
‎04-10-2016 10:16 PM
Super User
Posts: 11,343

Re: Creating Variable based on reading data across the row.

Posted in reply to Sotarkadin

This works partially because the values you want are in the names of the variables and are separated by the _;

The main part is the array and looping through it backwards. Note the order of variables on the ARRAY statement is important.

The LEAVE statement says to stop looping when the first non-missing value is found. If there are no values for any of the _dt variables then status will be blank.

 

data have;
   informat account $6.;
   informat   open_dt act1_dt act2_dt act3_dt act4_dt close_dt mmddyy8.;
   format   open_dt act1_dt act2_dt act3_dt act4_dt close_dt mmddyy10.;
   input  Account open_dt act1_dt act2_dt act3_dt act4_dt close_dt;   
datalines;
123456  3/1/16    .        .          .         .     .
 23456  3/1/16    3/3/16   .          .         .     .
 34567  3/13/16   3/15/16   3/17/16    3/18/16  .     3/20/16
;
run;

data want;
   set have;
   length status $ 6;
   array dt open_dt act1_dt act2_dt act3_dt act4_dt close_dt;
   do i = dim(dt) to 1 by (-1);
      if not missing( dt[i] ) then do;
         status = scan(vname(dt[i]),1,'_');
         leave;
      end;
   end;
   drop i;
run;
 
Super User
Super User
Posts: 7,961

Re: Creating Variable based on reading data across the row.

Hi,

 

Could I just add a small function I found recently that could help shrink the code somewhat.  I recently discovered that you can do a do loop over an array:

data want;
  length status $6;
  set have;
  array dt close_dt act4_dt act3dt act2_dt act1_dt open_dt;
  do over dt;
    if not missing(dt) and missing(status) then status=scan(vname(dt),1,"_");
  end;
run;

You will note that I have reversed the array setup to get the last in the list otherwise more or less the same, but without the incrementor.  Just wanted to point it out as the do over command is quite cool.

Super User
Posts: 11,343

Re: Creating Variable based on reading data across the row.


RW9 wrote:

Hi,

 

You will note that I have reversed the array setup to get the last in the list otherwise more or less the same, but without the incrementor.  Just wanted to point it out as the do over command is quite cool.


Quite cool, quite old, I learned it in SAS 5.1 Smiley Happy

 

And as any loop there are many ways. I picked one that directly (as close as I understand the problem) that would emulate the paper and pencil approach as simple to understand.

Contributor
Posts: 21

Re: Creating Variable based on reading data across the row.

Thanks for the replys all.  I am working on understanding the array.  It worked the first run through, I have one question/alteration.  There can be instances where all the dates are empty bc no action has been taking yet.  When this happened it returned the status as open since thats the first column it reads.  Is there a way for it to leave status blank if none of them have any dates?

Super User
Posts: 11,343

Re: Creating Variable based on reading data across the row.

Posted in reply to Sotarkadin

Sotarkadin wrote:

Thanks for the replys all.  I am working on understanding the array.  It worked the first run through, I have one question/alteration.  There can be instances where all the dates are empty bc no action has been taking yet.  When this happened it returned the status as open since thats the first column it reads.  Is there a way for it to leave status blank if none of them have any dates?


 

 

 

The solution I posted leaves status blank when all of Open through close are missing.

Super User
Super User
Posts: 7,050

Re: Creating Variable based on reading data across the row.

Posted in reply to Sotarkadin

If you want to build that type of conditional logic you need to include the ELSE statements.

IF A then action1; ELSE IF B then action2; ELSE IF C ....

Super User
Super User
Posts: 7,050

Re: Creating Variable based on reading data across the row.

Posted in reply to Sotarkadin

Is the status determined by which variable has the latest date?

If so then you can use functions like MAX() and WHICHN() to figure out which variable has the maximum value.  If you want to control which status wins when there is a tie for maximum date then set the order of the variables in the ARRAY statement.

data test;
  length account $6 status $6;
  informat open_dt act1_dt act2_dt act3_dt act4_dt close_dt mmddyy8.;
  format   open_dt act1_dt act2_dt act3_dt act4_dt close_dt yymmdd10.;
  input  Account open_dt act1_dt act2_dt act3_dt act4_dt close_dt;  
  array dt open_dt act1_dt act2_dt act3_dt act4_dt close_dt ;
  if n(of dt(*)) then status=upcase(scan(vname(dt(whichn(max(of dt(*)),of dt(*)))),1,'_'));
datalines;
Open  3/1/16    .        .          .         .     .
ACT1  3/1/16    3/3/16   .          .         .     .
CLOSE 3/13/16   3/15/16   3/17/16    3/18/16  .     3/20/16
TIE   3/13/16   3/15/16   3/16/16    3/16/16  .     . 
None . . . . . .
;

proc print; 
run;

Capture.PNG

Contributor
Posts: 21

Re: Creating Variable based on reading data across the row.

[ Edited ]

The status is not controlled by the date but by which action has a date.  Each of those actions could happen on the same date but they do happen in order.  It is opened, then act1, act2, etc.. until it's closed.  The last action to occur tells me where it is at in the process.  The close_dt is the last one as not all of the actions have to occur before it's closed.  

Super User
Super User
Posts: 7,050

Re: Creating Variable based on reading data across the row.

Posted in reply to Sotarkadin

Sotarkadin wrote:

The status is not controlled by the date but by which action has a date.  Each of those actions could happen on the same date but they do happen in order.  It is opened, then act1, act2, etc.. until it's closed.  The last action to occur tells me where it is at in the process.  The close_dt is the last one as not all of the actions have to occur before it's closed.  


In that case the order of the variables in the array matter. So place them in decreasing order of status. So ties will go to the highest status.  closed,...act2,act1,opened.

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 337 views
  • 0 likes
  • 4 in conversation