Hi, I have a variable with a series 1 and 0 in dataset, most of the time they are like "111011011" for 9 observations for example. Most importantly I want to change the first 0 to 1 and keep the second 0. There are more specific rules and I will display it via an example:
data have;
input ID time var;
cards;
1 1 1
1 2 1
1 3 0
1 4 1
1 5 1
1 6 1
1 7 0
1 8 1
2 1 1
2 2 1
2 3 0
2 4 0
2 5 1
2 6 1
2 7 0
2 8 1
;
run;
In this example, I have two IDs with different pattern of the variable "var" over time. I wanted to replace the first 0 by 1 only when this 0 is followed by 1 (e.g., var = 0 for ID 1 at time 3). I do not replace 0 for ID 2 at time 3 because it is also 0 for time 4. The 0 at time 7 for ID 2 will also not replaced by 1 although the pattern is like "101", because it is not the first 0 for this ID.
The final dataset I want is:
data want;
input ID time var;
cards;
1 1 1
1 2 1
1 3 1
1 4 1
1 5 1
1 6 1
1 7 0
1 8 1
2 1 1
2 2 1
2 3 0
2 4 0
2 5 1
2 6 1
2 7 0
2 8 1
;
run;
Is there any way to accomplish that? Thanks!
Looking ahead is difficult. Here is a method using a second SET statement that re-reads the data starting from the second observation. Tack on an extra empty observation to have the number of observations match.
You will also need to keep track of whether or not you have found a zero to be able to detect the first one.
data want;
set have ;
by id;
set have(keep=var rename=(var=next) firstobs=2) have(obs=1 drop=_all_);
retain found;
if last.id then call missing(next);
if first.id then found=0;
if var=0 and not found then do;
found=1;
if next=1 then var=1;
end;
run;
Result
Obs ID time var next found 1 1 1 1 1 0 2 1 2 1 0 0 3 1 3 1 1 1 4 1 4 1 1 1 5 1 5 1 1 1 6 1 6 1 0 1 7 1 7 0 1 1 8 1 8 1 . 1 9 2 1 1 1 0 10 2 2 1 0 0 11 2 3 0 0 1 12 2 4 0 1 1 13 2 5 1 1 1 14 2 6 1 0 1 15 2 7 0 1 1 16 2 8 1 . 1
Looking ahead is difficult. Here is a method using a second SET statement that re-reads the data starting from the second observation. Tack on an extra empty observation to have the number of observations match.
You will also need to keep track of whether or not you have found a zero to be able to detect the first one.
data want;
set have ;
by id;
set have(keep=var rename=(var=next) firstobs=2) have(obs=1 drop=_all_);
retain found;
if last.id then call missing(next);
if first.id then found=0;
if var=0 and not found then do;
found=1;
if next=1 then var=1;
end;
run;
Result
Obs ID time var next found 1 1 1 1 1 0 2 1 2 1 0 0 3 1 3 1 1 1 4 1 4 1 1 1 5 1 5 1 1 1 6 1 6 1 0 1 7 1 7 0 1 1 8 1 8 1 . 1 9 2 1 1 1 0 10 2 2 1 0 0 11 2 3 0 0 1 12 2 4 0 1 1 13 2 5 1 1 1 14 2 6 1 0 1 15 2 7 0 1 1 16 2 8 1 . 1
Hi @Chaupak,
If dataset HAVE is sorted by ID TIME and variable VAR contains only 0s and 1s, you can also use the implied look-ahead enabled by a BY statement:
data want(drop=f);
set have;
by id var notsorted;
if first.id then f=.;
if ~f & var=0 & ~last.id then do;
var=last.var;
f+1;
end;
run;
data have; input ID time var; cards; 1 1 1 1 2 1 1 3 0 1 4 1 1 5 1 1 6 1 1 7 0 1 8 1 2 1 1 2 2 1 2 3 0 2 4 0 2 5 1 2 6 1 2 7 0 2 8 1 ; run; data want; merge have have(keep=id var rename=(id=_id var=_var) firstobs=2); retain found .; if id ne lag(id) then found=.; if not found and var=0 and id=_id and _var=1 and (id ne lag(id) or (id=lag(id) and lag(var)=1)) then do;found=1;var=1;end; if var=0 then found=1; drop found _:; run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.