Hello all,
I would like to create a binary indicator variable (0/1) that tells me when/if another variable 'status' changes from year to year within it's groupid. I've created an example below of what I'm thinking:
Year_GroupID Status Indicator
1997-01 0 0
1998-01 0 --------------> 0
1999-01 0 0
2000-01 0 0
1996-02 0 0
1997-02 0 0
1998-02 1 1
1999-02 1 0
2002-03 0 0
2003-03 1 1
2004-03 1 0
2005-03 1 0
2006-03 1 0
2007-03 0 1
2008-03 0 0
Ideally, I would like an indicator variable that tells me if the status column changed at all (i.e. 0 -> 1 or 1 ->0), but also it would be nice to have two other indicators: (1) one that just tells me if the status column changed (0-> 1), and (2) one that just tells me if the status column changed (1 -> 0).
I appreciate the community' help as always.
Thank you in advance!
Hello @r4321,
Try this:
/* Create test data */
data have(drop=i j k l);
length Year_GroupID $8;
do i=0 to 80;
l=i;
do j=0 to 3;
Year_GroupID=cats(1997+j,'-',put(i,z2.));
k=mod(l,3);
Status=ifn(k=2,.,k);
output;
l=(l-k)/3;
end;
end;
run;
/* Temporarily provide suitable BY variables */
data tmp / view=tmp;
set have;
length _groupID $8;
_year=input(scan(year_groupID,1,'-'),8.);
_groupID=scan(year_groupID,2,'-');
run;
/* Create indicators */
data want(drop=_:);
array c[9] $2 _temporary_ ('00','01','10','11','.0','.1','0.','1.','..');
array i_up[9] _temporary_ ( 0 , 1 , 0 , 0 , 0 , . , . , 0 , . );
array i_dn[9] _temporary_ ( 0 , 0 , 1 , 0 , . , 0 , 0 , . , . );
array i_ch[9] _temporary_ ( 0 , 1 , 1 , 0 , . , . , . , . , . );
set tmp;
by _groupID _year;
_ls=lag(status);
if first._groupID then do;
ind_up=0;
ind_dn=0;
ind_ch=0;
end;
else do;
_j=whichc(cats(_ls,status), of c[*]);
ind_up=i_up[_j];
ind_dn=i_dn[_j];
ind_ch=i_ch[_j];
end;
run;
The definitions of the three indicators (IND_UP for status increase, IND_DN for status decrease, IND_CH for status change) are provided as array elements (arrays I_UP, I_DN and I_CH, respectively) corresponding to the (presumably) 9 possible combinations of previous status and current status in array C.
Example: '.0' in array C means "previous status missing and current status 0." So, if the missing status is just unknown, the status might have changed (from 1 to 0), but we don't know, hence IND_DN=IND_CH=., whereas IND_UP=0 is for sure.
All three indicators are set to 0 for the first record of each GroupID.
Please feel free to adapt the rules as needed.
data have;
input Year_GroupID $ Status;
datalines;
1997-01 0
1998-01 0
1999-01 0
2000-01 0
1996-02 0
1997-02 0
1998-02 1
1999-02 1
2002-03 0
2003-03 1
2004-03 1
2005-03 1
2006-03 1
2007-03 0
2008-03 0
;
data want(drop=lagStatus);
set have;
lagStatus=lag1(Status);
Indicator1=0;Indicator2=0;
if lagStatus=1 & Status=0 then Indicator1=1;
else if lagStatus=0 & Status=1 then Indicator2=1;
run;
data have;
input Year_GroupID $ Status;
datalines;
1997-01 0
1998-01 0
1999-01 0
2000-01 0
1996-02 0
1997-02 0
1998-02 1
1999-02 1
2002-03 0
2003-03 1
2004-03 1
2005-03 1
2006-03 1
2007-03 0
2008-03 0
;
data want;
set have;
if lag1(status) ^= status then indicator=1;
else indicator=0;
run;
You can make it even shorter:
data want;
set have;
indicator = (lag1(status) ^= status);
run;
Hello @r4321,
Try this:
/* Create test data */
data have(drop=i j k l);
length Year_GroupID $8;
do i=0 to 80;
l=i;
do j=0 to 3;
Year_GroupID=cats(1997+j,'-',put(i,z2.));
k=mod(l,3);
Status=ifn(k=2,.,k);
output;
l=(l-k)/3;
end;
end;
run;
/* Temporarily provide suitable BY variables */
data tmp / view=tmp;
set have;
length _groupID $8;
_year=input(scan(year_groupID,1,'-'),8.);
_groupID=scan(year_groupID,2,'-');
run;
/* Create indicators */
data want(drop=_:);
array c[9] $2 _temporary_ ('00','01','10','11','.0','.1','0.','1.','..');
array i_up[9] _temporary_ ( 0 , 1 , 0 , 0 , 0 , . , . , 0 , . );
array i_dn[9] _temporary_ ( 0 , 0 , 1 , 0 , . , 0 , 0 , . , . );
array i_ch[9] _temporary_ ( 0 , 1 , 1 , 0 , . , . , . , . , . );
set tmp;
by _groupID _year;
_ls=lag(status);
if first._groupID then do;
ind_up=0;
ind_dn=0;
ind_ch=0;
end;
else do;
_j=whichc(cats(_ls,status), of c[*]);
ind_up=i_up[_j];
ind_dn=i_dn[_j];
ind_ch=i_ch[_j];
end;
run;
The definitions of the three indicators (IND_UP for status increase, IND_DN for status decrease, IND_CH for status change) are provided as array elements (arrays I_UP, I_DN and I_CH, respectively) corresponding to the (presumably) 9 possible combinations of previous status and current status in array C.
Example: '.0' in array C means "previous status missing and current status 0." So, if the missing status is just unknown, the status might have changed (from 1 to 0), but we don't know, hence IND_DN=IND_CH=., whereas IND_UP=0 is for sure.
All three indicators are set to 0 for the first record of each GroupID.
Please feel free to adapt the rules as needed.
If the status only takes values of 1 and 0 as shown perhaps this is sufficient:
data have; input Year_GroupID $ Status; datalines; 1997-01 0 1998-01 0 1999-01 0 2000-01 0 1996-02 0 1997-02 0 1998-02 1 1999-02 1 2002-03 0 2003-03 1 2004-03 1 2005-03 1 2006-03 1 2007-03 0 2008-03 0 ; data want; set have; d = (dif(status) ne 0); d1 = (dif(status) = 1); d2 = (dif(status) = -1); if _n_=1 then d=0; label d = 'Any change' d1 = 'Change 0 to 1' d2 = 'Change 1 to 0' ; run;
Let's dissect the year_groupid into its relevant parts, so we can use by-processing to simplify things:
data have;
input Year_GroupID $ Status;
length
group $2
year 4
;
group = scan(Year_GroupID,2,'-');
year = input(scan(Year_GroupID,1,'-'),4.);
datalines;
1997-01 0
1998-01 0
1999-01 0
2000-01 0
1996-02 0
1997-02 0
1998-02 1
1999-02 1
2002-03 0
2003-03 1
2004-03 1
2005-03 1
2006-03 1
2007-03 0
2008-03 0
;
data want;
set have;
by group;
indicator = (not first.group and lag1(status) ^= status);
run;
proc print data=want noobs;
var year_groupid status indicator;
run;
The result matches your expected result exactly:
Year_ GroupID Status indicator 1997-01 0 0 1998-01 0 0 1999-01 0 0 2000-01 0 0 1996-02 0 0 1997-02 0 0 1998-02 1 1 1999-02 1 0 2002-03 0 0 2003-03 1 1 2004-03 1 0 2005-03 1 0 2006-03 1 0 2007-03 0 1 2008-03 0 0
The additional indicators can be derived with two additional statements:
data want;
set have;
by group;
indicator = (not first.group and lag1(status) ^= status);
ind_up = (indicator and status);
ind_down = (indicator and not status);
run;
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.