I have a table that is being created, we will say that column 1 is Id, column2 is YearMonth and column3 is Flag.
data have; input
Id YearMonth Flag; datalines;
1 200101 1
1 200102 1
1 200103 0
1 200104 1
1 200105 1
1 200106 0
1 200107 1
1 200108 0
2 200101 0
2 200102 0
2 200103 1
2 200104 1
2 200105 0
2 200106 0
2 200107 1
2 200108 0
;
I want to add 1 to flag if current flag doesn't match with previous row value along with two major conditions (explained after output for a better explanation).
The output should be:
Id | YearMonth | Flag | Stage |
1 | 200001 | 1 | 1 |
1 | 200002 | 1 | 1 |
1 | 200003 | 0 | 2 |
1 | 200004 | 1 | 3 |
1 | 200005 | 1 | 3 |
1 | 200006 | 0 | 4 |
1 | 200007 | 1 | 3 |
1 | 200008 | 0 | 4 |
2 | 200001 | 0 | 2 |
2 | 200002 | 0 | 2 |
2 | 200003 | 1 | 3 |
2 | 200004 | 1 | 3 |
2 | 200005 | 0 | 4 |
2 | 200006 | 0 | 4 |
2 | 200007 | 1 | 3 |
2 | 200008 | 0 | 4 |
Condition-1: There are only 4 stages. Hence if a flag is repeated after stage 4, then it should not increment and should give output as either stage=3 if flag=1 or stage=4 if flag=0.
Condition-2: If first entry of an Id is 0 then the stage should start with 2 not 1.
I am trying something like this (helped by Richard user:1249962 (stackoverflow))
data want(drop=maxstage_reached);
set have;
if flag ne lag(flag) then stage+1; * increment per rules;
if stage = 4 then maxstage_reached=1; * tracking flag (retained);
if maxstage_reached then stage = 4 - flag; * special rule;
retain maxstage_reached;
run;
Currently I am struggling with:
1)group by Id; I am trying to use "by" but it doesnt work
2)Condition-2
Thank you!
Hi @John04 Seems nothing more than a pretty straight forward BY GROUP processing and counter logic i.e. assuming I correctly understand your requirement, which I can believe I have?-
data have;
input Id YearMonth Flag;
datalines;
1 200101 1
1 200102 1
1 200103 0
1 200104 1
1 200105 1
1 200106 0
1 200107 1
1 200108 0
2 200101 0
2 200102 0
2 200103 1
2 200104 1
2 200105 0
2 200106 0
2 200107 1
2 200108 0
;
run;
data want;
_n_=0;
do until(last.id);
set have;
by id flag notsorted;
if first.id and first.flag and not flag then _n_=2;
else if first.flag and _n_<4 then _n_=sum(_n_,1);
if _n_=4 then stage=ifn(flag,3,4);
else stage=_n_;
output;
end;
run;
data have;
input Id YearMonth Flag;
datalines;
1 200101 1
1 200102 1
1 200103 0
1 200104 1
1 200105 1
1 200106 0
1 200107 1
1 200108 0
2 200101 0
2 200102 0
2 200103 1
2 200104 1
2 200105 0
2 200106 0
2 200107 1
2 200108 0
;
run;
data want;
do until(last.id);
lagflag=_N_;
set have;
by id;
_N_=flag;
if first.id then
do;
lagflag=flag;
cnt=id;
end;
else if flag ne lagflag then
do;
cnt+1;
end;
if cnt>4 and lag(cnt)=4 and flag=1 then
do;
cnt=3;
end;
else if cnt>4 and lag(cnt)=4 and flag=0 then
do;
cnt=4;
end;
output;
end;
run;
Hi @John04 Seems nothing more than a pretty straight forward BY GROUP processing and counter logic i.e. assuming I correctly understand your requirement, which I can believe I have?-
data have;
input Id YearMonth Flag;
datalines;
1 200101 1
1 200102 1
1 200103 0
1 200104 1
1 200105 1
1 200106 0
1 200107 1
1 200108 0
2 200101 0
2 200102 0
2 200103 1
2 200104 1
2 200105 0
2 200106 0
2 200107 1
2 200108 0
;
run;
data want;
_n_=0;
do until(last.id);
set have;
by id flag notsorted;
if first.id and first.flag and not flag then _n_=2;
else if first.flag and _n_<4 then _n_=sum(_n_,1);
if _n_=4 then stage=ifn(flag,3,4);
else stage=_n_;
output;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.