I currently have a data set that has columns with different dates for when actions have occured. I need to create a column with the stauts based on which column is the last to have a date.
Example Data:
OBS Account # open_dt act1_dt act2_dt act3_dt act4_dt close_dt
1 123456 3/1/16
2 23456 3/1/16 3/3/16
3 34567 3/13/16 3/15/16 3/17/16 3/18/16 3/20/16
I would like to create a column called status that sets a status based on which one is filled:
output would be all the data from above with one extra column for Status
OBS ...... status
1 open
2 act1
3 close
In explaning the logic it would go:
If open_dt not null then status = 'open'
if act1_dt not null then status = 'act1'
if act2_dt not null then status = 'act2'
if act3_dt not null then status = 'act3'
if act4_dt not null then status = 'act4'
if close_dt_dt not null then status = 'closed'
I feel like it would work like the above where it goes through them in order and just changes the status as it finds the next action.
I am just not sure how to actual right this in SAS
This works partially because the values you want are in the names of the variables and are separated by the _;
The main part is the array and looping through it backwards. Note the order of variables on the ARRAY statement is important.
The LEAVE statement says to stop looping when the first non-missing value is found. If there are no values for any of the _dt variables then status will be blank.
data have;
informat account $6.;
informat open_dt act1_dt act2_dt act3_dt act4_dt close_dt mmddyy8.;
format open_dt act1_dt act2_dt act3_dt act4_dt close_dt mmddyy10.;
input Account open_dt act1_dt act2_dt act3_dt act4_dt close_dt;
datalines;
123456 3/1/16 . . . . .
23456 3/1/16 3/3/16 . . . .
34567 3/13/16 3/15/16 3/17/16 3/18/16 . 3/20/16
;
run;
data want;
set have;
length status $ 6;
array dt open_dt act1_dt act2_dt act3_dt act4_dt close_dt;
do i = dim(dt) to 1 by (-1);
if not missing( dt[i] ) then do;
status = scan(vname(dt[i]),1,'_');
leave;
end;
end;
drop i;
run;
This works partially because the values you want are in the names of the variables and are separated by the _;
The main part is the array and looping through it backwards. Note the order of variables on the ARRAY statement is important.
The LEAVE statement says to stop looping when the first non-missing value is found. If there are no values for any of the _dt variables then status will be blank.
data have;
informat account $6.;
informat open_dt act1_dt act2_dt act3_dt act4_dt close_dt mmddyy8.;
format open_dt act1_dt act2_dt act3_dt act4_dt close_dt mmddyy10.;
input Account open_dt act1_dt act2_dt act3_dt act4_dt close_dt;
datalines;
123456 3/1/16 . . . . .
23456 3/1/16 3/3/16 . . . .
34567 3/13/16 3/15/16 3/17/16 3/18/16 . 3/20/16
;
run;
data want;
set have;
length status $ 6;
array dt open_dt act1_dt act2_dt act3_dt act4_dt close_dt;
do i = dim(dt) to 1 by (-1);
if not missing( dt[i] ) then do;
status = scan(vname(dt[i]),1,'_');
leave;
end;
end;
drop i;
run;
Hi,
Could I just add a small function I found recently that could help shrink the code somewhat. I recently discovered that you can do a do loop over an array:
data want; length status $6; set have; array dt close_dt act4_dt act3dt act2_dt act1_dt open_dt; do over dt; if not missing(dt) and missing(status) then status=scan(vname(dt),1,"_"); end; run;
You will note that I have reversed the array setup to get the last in the list otherwise more or less the same, but without the incrementor. Just wanted to point it out as the do over command is quite cool.
@RW9 wrote:
Hi,
You will note that I have reversed the array setup to get the last in the list otherwise more or less the same, but without the incrementor. Just wanted to point it out as the do over command is quite cool.
Quite cool, quite old, I learned it in SAS 5.1 🙂
And as any loop there are many ways. I picked one that directly (as close as I understand the problem) that would emulate the paper and pencil approach as simple to understand.
Thanks for the replys all. I am working on understanding the array. It worked the first run through, I have one question/alteration. There can be instances where all the dates are empty bc no action has been taking yet. When this happened it returned the status as open since thats the first column it reads. Is there a way for it to leave status blank if none of them have any dates?
@Sotarkadin wrote:
Thanks for the replys all. I am working on understanding the array. It worked the first run through, I have one question/alteration. There can be instances where all the dates are empty bc no action has been taking yet. When this happened it returned the status as open since thats the first column it reads. Is there a way for it to leave status blank if none of them have any dates?
The solution I posted leaves status blank when all of Open through close are missing.
If you want to build that type of conditional logic you need to include the ELSE statements.
IF A then action1; ELSE IF B then action2; ELSE IF C ....
Is the status determined by which variable has the latest date?
If so then you can use functions like MAX() and WHICHN() to figure out which variable has the maximum value. If you want to control which status wins when there is a tie for maximum date then set the order of the variables in the ARRAY statement.
data test;
length account $6 status $6;
informat open_dt act1_dt act2_dt act3_dt act4_dt close_dt mmddyy8.;
format open_dt act1_dt act2_dt act3_dt act4_dt close_dt yymmdd10.;
input Account open_dt act1_dt act2_dt act3_dt act4_dt close_dt;
array dt open_dt act1_dt act2_dt act3_dt act4_dt close_dt ;
if n(of dt(*)) then status=upcase(scan(vname(dt(whichn(max(of dt(*)),of dt(*)))),1,'_'));
datalines;
Open 3/1/16 . . . . .
ACT1 3/1/16 3/3/16 . . . .
CLOSE 3/13/16 3/15/16 3/17/16 3/18/16 . 3/20/16
TIE 3/13/16 3/15/16 3/16/16 3/16/16 . .
None . . . . . .
;
proc print;
run;
The status is not controlled by the date but by which action has a date. Each of those actions could happen on the same date but they do happen in order. It is opened, then act1, act2, etc.. until it's closed. The last action to occur tells me where it is at in the process. The close_dt is the last one as not all of the actions have to occur before it's closed.
@Sotarkadin wrote:
The status is not controlled by the date but by which action has a date. Each of those actions could happen on the same date but they do happen in order. It is opened, then act1, act2, etc.. until it's closed. The last action to occur tells me where it is at in the process. The close_dt is the last one as not all of the actions have to occur before it's closed.
In that case the order of the variables in the array matter. So place them in decreasing order of status. So ties will go to the highest status. closed,...act2,act1,opened.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.