Hello SAS community,
I have a dataset that looks like the below, basically an id variable to flag different entities the then another variable that takes different values (can be one of four values: missing, Open,Char or Clos). For each id variable that data are sorted by time (i have a "proper" date variable, but here i include it as a number for simplicity).
data have;
input id $ x $ time ;
cards;
1 Open 1
1 Open 2
1 Clos 3
1 Clos 4
1 Clos 5
1 Open 6
1 Open 7
2 Open 1
2 Clos 2
2 Char 3
2 Char 4
;;;;
run;
I would like to create another variable y that tells me (within each id) what is the next value of x when x is equal to "Clos", that is not "Clos", like below. So this variable takes a value when x is clos, telling you what is the next non-clos value within a given id.
Thank you all very much.
data want;
input id $ x $ time y $ ;
cards;
1 Open 1 .
1 Open 2 .
1 Clos 3 open
1 Clos 4 open
1 Clos 5 open
1 Open 6 .
1 Open 7 .
2 Open 1 .
2 Clos 2 char
2 Char 3 .
2 Char 4 .
;;;;
run;
Great idea, @Tom, and so much easier than other look-ahead techniques.
Small correction: I think call missing(last) in observations where first.id should be restricted to cases with x='Clos' to avoid losing a few values for variable want (example: the sample dataset HAVE with the last observation removed).
if x ne 'Clos' then last=x; else do; if first.id then call missing(last); want=last; end;
Or, alternatively, set last to missing in the last observation of each BY group:
if x ne 'Clos' then last=x; else want=last; if last.id then call missing(last);
I cannot figure out what real world problem would want something like this. Can you explain what you are trying to do?
In general it is much easier to REMEMBER something than it is to predict the FUTURE.
Sort the data in descending order and use a RETAINed variable to remember the latest non CLOS value. You will want to a second non-retained variable to output.
data have;
input id $ x $ time ;
row+1;
cards4;
1 Open 1
1 Open 2
1 Clos 3
1 Clos 4
1 Clos 5
1 Open 6
1 Open 7
2 Open 1
2 Clos 2
2 Char 3
2 Char 4
;;;;
proc sort;
by id descending row ;
run;
data want;
set have;
by id;
if x ne 'Clos' then last=X;
retain last;
if first.id then call missing(last);
if x = 'Clos' then want=last;
drop last;
run;
proc sort;
by id row;
run;
Result
Obs id x time row want 1 1 Open 1 1 2 1 Open 2 2 3 1 Clos 3 3 Open 4 1 Clos 4 4 Open 5 1 Clos 5 5 Open 6 1 Open 6 6 7 1 Open 7 7 8 2 Open 1 8 9 2 Clos 2 9 Char 10 2 Char 3 10 11 2 Char 4 11
Great idea, @Tom, and so much easier than other look-ahead techniques.
Small correction: I think call missing(last) in observations where first.id should be restricted to cases with x='Clos' to avoid losing a few values for variable want (example: the sample dataset HAVE with the last observation removed).
if x ne 'Clos' then last=x; else do; if first.id then call missing(last); want=last; end;
Or, alternatively, set last to missing in the last observation of each BY group:
if x ne 'Clos' then last=x; else want=last; if last.id then call missing(last);
Thanks. I was trying to avoid having to define the type and/or length of LAST by having the first place it appears be the assignment statement and missed that potential erasure.
Thanks everyone for the codes, they all work perfectly. Regards
/*
I think Tom's code is the best way to do.
But for having some fun.
*/
data have;
input id $ x $ time ;
cards4;
1 Open 1
1 Open 2
1 Clos 3
1 Clos 4
1 Clos 5
1 Open 6
1 Open 7
2 Open 1
2 Clos 2
2 Char 3
2 Char 4
;;;;
data want ;
if 0 then set have;
if 0 then set have(rename=(x=y));
if 0 then set have(rename=(x=want));
declare hash h(ordered:'y');
declare hiter hi('h');
h.definekey('time');
h.definedata('y');
h.definedone();
do until(last.id);
set have;
by id;
h.add(key:time,data:x);
end;
do until(last.id);
set have;
by id;
call missing(want);
if x='Clos' then do;
rc=hi.setcur(key:time);
do while(rc=0);
if y ne 'Clos' then do;want=y;leave;end;
rc=hi.next();
end;
end;
output;
end;
h.delete();hi.delete();
drop y rc;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.