I need to change the value of variable VarY if Flag=1. This needs to be done within each unique value under variable VarY. Each VarX could have multiple values of VarY corresponding to it. Here VarY takes 734 and 6011 when VarX is 11.
For example in the below example, When Flag=1, VarY needs to change to 6011_1 at the first occurrence of 1 in Flag and it needs to become 6011_2 at the second occurrence of 1 in Flag. All the remaining rows under that value of VarY(or until the next occurrence of Flag=1) also need to take the new value of VarY.
Could someone please guide me?
Input dataset temp:
VarX | VarY | Flag |
11 | 734 | . |
11 | 734 | . |
11 | 734 | . |
11 | 734 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | 1 |
11 | 6011 | . |
11 | 6011 | 1 |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
12 | 123 | . |
12 | 123 | 1 |
12 | 123 | . |
Desired output:
VarX | VarY | Flag |
11 | 734 | . |
11 | 734 | . |
11 | 734 | . |
11 | 734 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011 | . |
11 | 6011_1 | 1 |
11 | 6011_1 | . |
11 | 6011_2 | 1 |
11 | 6011_2 | . |
11 | 6011_2 | . |
11 | 6011_2 | . |
11 | 6011_2 | . |
11 | 6011_2 | . |
12 | 123 | . |
12 | 123_1 | 1 |
12 | 123_1 | . |
Thank you!
something like this?
data have;
input VarX VarY Flag;
cards;
11 734 .
11 734 .
11 734 .
11 734 .
11 6011 .
11 6011 .
11 6011 .
11 6011 .
11 6011 .
11 6011 .
11 6011 .
11 6011 .
11 6011 .
11 6011 .
11 6011 1
11 6011 .
11 6011 1
11 6011 .
11 6011 .
11 6011 .
11 6011 .
11 6011 .
12 123 .
12 123 1
12 123 .
;
data want;
set have(rename=(VarY=_VarY));
by _VarY notsorted;
length VarY $10;
retain VarY f;
if first._VarY then do; c=.;f=.;end;
if not f then VarY=put(_VarY,8. -l);
if flag then do;
c+1;
f=1;
VarY=catx('_',_VarY,c);
end;
drop c f;
run;
This example is well suited to SET within DO UNTIL.
data want;
suffix = 0;
do until(last.vary);
set have;
by varx vary;
suffix + flag;
length newY $16;
if suffix gt 0
then newy = catx('_',vary,suffix);
else newy = left(vvalue(vary));
output;
end;
drop suffix;
run;
If vary is only integers (which is probably the case given the task definition) and you know the minimum and maximum expected values (perhaps you don't), then a temporary array is the most efficient:
data have;
input VarX VarY Flag;
cards;
11 734 .
11 734 .
11 734 .
11 734 .
11 6011 .
11 6011 .
11 6011 .
11 6011 .
11 6011 .
11 6011 .
11 6011 .
11 6011 .
11 6011 .
11 6011 .
11 6011 1
11 6011 .
11 6011 1
11 6011 .
11 6011 .
11 6011 .
11 6011 .
11 6011 .
12 123 .
12 123 1
12 123 .
;
data want;
set have;
array vy {1:7000} _temporary_;
if flag=1 then do;
if vy{vary}=. then vy{vary}=vary;
vary=vy{vary}+.1;
vy{vary}=vary;
end;
else vary=coalesce(vy{vary},vary);
run;
The _temporary_ array attribute tells sas to preserve values from obs to obs, but do not write the array elements to the output data set(s). The coalesce function says to take the first non-missing value from the list.
Now if you don't know any limits that will contain all vary values, the analogous approach is to use a hash object:
data want2 (drop=_:);
set have;
if _n_=1 then do;
_newvary=.;
declare hash h ();
h.definekey('vary');
h.definedata('_newvary');
h.definedone();
end;
if flag=1 then do;
if h.find()^=0 then h.add(key:vary,data:vary+.1);
else h.replace(key:vary,data:_newvary+.1);
end;
if h.find()=0 then vary=_newvary;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.