I have a table that is being created, we will say that column 1 is Id, column2 is YearMonth and column3 is Flag.
data have; input
Id YearMonth Flag; datalines;
1 200101 1
1 200102 1
1 200103 0
1 200104 1
1 200105 1
1 200106 0
1 200107 1
1 200108 0
2 200101 0
2 200102 0
2 200103 1
2 200104 1
2 200105 0
2 200106 0
2 200107 1
2 200108 0
;
I want to add 1 to flag if current flag doesn't match with previous row value along with two major conditions (explained after output for a better explanation).
The output should be:
Id | YearMonth | Flag | Stage |
1 | 200001 | 1 | 1 |
1 | 200002 | 1 | 1 |
1 | 200003 | 0 | 2 |
1 | 200004 | 1 | 3 |
1 | 200005 | 1 | 3 |
1 | 200006 | 0 | 4 |
1 | 200007 | 1 | 3 |
1 | 200008 | 0 | 4 |
2 | 200001 | 0 | 2 |
2 | 200002 | 0 | 2 |
2 | 200003 | 1 | 3 |
2 | 200004 | 1 | 3 |
2 | 200005 | 0 | 4 |
2 | 200006 | 0 | 4 |
2 | 200007 | 1 | 3 |
2 | 200008 | 0 | 4 |
Condition-1: There are only 4 stages. Hence if a flag is repeated after stage 4, then it should not increment and should give output as either stage=3 if flag=1 or stage=4 if flag=0.
Condition-2: If first entry of an Id is 0 then the stage should start with 2 not 1.
I am trying something like this (helped by Richard user:1249962 (stackoverflow))
data want(drop=maxstage_reached);
set have;
if flag ne lag(flag) then stage+1; * increment per rules;
if stage = 4 then maxstage_reached=1; * tracking flag (retained);
if maxstage_reached then stage = 4 - flag; * special rule;
retain maxstage_reached;
run;
Currently I am struggling with:
1)group by Id; I am trying to use "by" but it doesnt work
2)Condition-2
Thank you!
Hi @John04 Seems nothing more than a pretty straight forward BY GROUP processing and counter logic i.e. assuming I correctly understand your requirement, which I can believe I have?-
data have;
input Id YearMonth Flag;
datalines;
1 200101 1
1 200102 1
1 200103 0
1 200104 1
1 200105 1
1 200106 0
1 200107 1
1 200108 0
2 200101 0
2 200102 0
2 200103 1
2 200104 1
2 200105 0
2 200106 0
2 200107 1
2 200108 0
;
run;
data want;
_n_=0;
do until(last.id);
set have;
by id flag notsorted;
if first.id and first.flag and not flag then _n_=2;
else if first.flag and _n_<4 then _n_=sum(_n_,1);
if _n_=4 then stage=ifn(flag,3,4);
else stage=_n_;
output;
end;
run;
data have;
input Id YearMonth Flag;
datalines;
1 200101 1
1 200102 1
1 200103 0
1 200104 1
1 200105 1
1 200106 0
1 200107 1
1 200108 0
2 200101 0
2 200102 0
2 200103 1
2 200104 1
2 200105 0
2 200106 0
2 200107 1
2 200108 0
;
run;
data want;
do until(last.id);
lagflag=_N_;
set have;
by id;
_N_=flag;
if first.id then
do;
lagflag=flag;
cnt=id;
end;
else if flag ne lagflag then
do;
cnt+1;
end;
if cnt>4 and lag(cnt)=4 and flag=1 then
do;
cnt=3;
end;
else if cnt>4 and lag(cnt)=4 and flag=0 then
do;
cnt=4;
end;
output;
end;
run;
Hi @John04 Seems nothing more than a pretty straight forward BY GROUP processing and counter logic i.e. assuming I correctly understand your requirement, which I can believe I have?-
data have;
input Id YearMonth Flag;
datalines;
1 200101 1
1 200102 1
1 200103 0
1 200104 1
1 200105 1
1 200106 0
1 200107 1
1 200108 0
2 200101 0
2 200102 0
2 200103 1
2 200104 1
2 200105 0
2 200106 0
2 200107 1
2 200108 0
;
run;
data want;
_n_=0;
do until(last.id);
set have;
by id flag notsorted;
if first.id and first.flag and not flag then _n_=2;
else if first.flag and _n_<4 then _n_=sum(_n_,1);
if _n_=4 then stage=ifn(flag,3,4);
else stage=_n_;
output;
end;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.