Hello all,
I would like to create a binary indicator variable (0/1) that tells me when/if another variable 'status' changes from year to year within it's groupid. I've created an example below of what I'm thinking:
Year_GroupID Status Indicator
1997-01 0 0
1998-01 0 --------------> 0
1999-01 0 0
2000-01 0 0
1996-02 0 0
1997-02 0 0
1998-02 1 1
1999-02 1 0
2002-03 0 0
2003-03 1 1
2004-03 1 0
2005-03 1 0
2006-03 1 0
2007-03 0 1
2008-03 0 0
Ideally, I would like an indicator variable that tells me if the status column changed at all (i.e. 0 -> 1 or 1 ->0), but also it would be nice to have two other indicators: (1) one that just tells me if the status column changed (0-> 1), and (2) one that just tells me if the status column changed (1 -> 0).
I appreciate the community' help as always.
Thank you in advance!
Hello @r4321,
Try this:
/* Create test data */
data have(drop=i j k l);
length Year_GroupID $8;
do i=0 to 80;
l=i;
do j=0 to 3;
Year_GroupID=cats(1997+j,'-',put(i,z2.));
k=mod(l,3);
Status=ifn(k=2,.,k);
output;
l=(l-k)/3;
end;
end;
run;
/* Temporarily provide suitable BY variables */
data tmp / view=tmp;
set have;
length _groupID $8;
_year=input(scan(year_groupID,1,'-'),8.);
_groupID=scan(year_groupID,2,'-');
run;
/* Create indicators */
data want(drop=_:);
array c[9] $2 _temporary_ ('00','01','10','11','.0','.1','0.','1.','..');
array i_up[9] _temporary_ ( 0 , 1 , 0 , 0 , 0 , . , . , 0 , . );
array i_dn[9] _temporary_ ( 0 , 0 , 1 , 0 , . , 0 , 0 , . , . );
array i_ch[9] _temporary_ ( 0 , 1 , 1 , 0 , . , . , . , . , . );
set tmp;
by _groupID _year;
_ls=lag(status);
if first._groupID then do;
ind_up=0;
ind_dn=0;
ind_ch=0;
end;
else do;
_j=whichc(cats(_ls,status), of c[*]);
ind_up=i_up[_j];
ind_dn=i_dn[_j];
ind_ch=i_ch[_j];
end;
run;
The definitions of the three indicators (IND_UP for status increase, IND_DN for status decrease, IND_CH for status change) are provided as array elements (arrays I_UP, I_DN and I_CH, respectively) corresponding to the (presumably) 9 possible combinations of previous status and current status in array C.
Example: '.0' in array C means "previous status missing and current status 0." So, if the missing status is just unknown, the status might have changed (from 1 to 0), but we don't know, hence IND_DN=IND_CH=., whereas IND_UP=0 is for sure.
All three indicators are set to 0 for the first record of each GroupID.
Please feel free to adapt the rules as needed.
data have;
input Year_GroupID $ Status;
datalines;
1997-01 0
1998-01 0
1999-01 0
2000-01 0
1996-02 0
1997-02 0
1998-02 1
1999-02 1
2002-03 0
2003-03 1
2004-03 1
2005-03 1
2006-03 1
2007-03 0
2008-03 0
;
data want(drop=lagStatus);
set have;
lagStatus=lag1(Status);
Indicator1=0;Indicator2=0;
if lagStatus=1 & Status=0 then Indicator1=1;
else if lagStatus=0 & Status=1 then Indicator2=1;
run;
data have;
input Year_GroupID $ Status;
datalines;
1997-01 0
1998-01 0
1999-01 0
2000-01 0
1996-02 0
1997-02 0
1998-02 1
1999-02 1
2002-03 0
2003-03 1
2004-03 1
2005-03 1
2006-03 1
2007-03 0
2008-03 0
;
data want;
set have;
if lag1(status) ^= status then indicator=1;
else indicator=0;
run;
You can make it even shorter:
data want;
set have;
indicator = (lag1(status) ^= status);
run;
Hello @r4321,
Try this:
/* Create test data */
data have(drop=i j k l);
length Year_GroupID $8;
do i=0 to 80;
l=i;
do j=0 to 3;
Year_GroupID=cats(1997+j,'-',put(i,z2.));
k=mod(l,3);
Status=ifn(k=2,.,k);
output;
l=(l-k)/3;
end;
end;
run;
/* Temporarily provide suitable BY variables */
data tmp / view=tmp;
set have;
length _groupID $8;
_year=input(scan(year_groupID,1,'-'),8.);
_groupID=scan(year_groupID,2,'-');
run;
/* Create indicators */
data want(drop=_:);
array c[9] $2 _temporary_ ('00','01','10','11','.0','.1','0.','1.','..');
array i_up[9] _temporary_ ( 0 , 1 , 0 , 0 , 0 , . , . , 0 , . );
array i_dn[9] _temporary_ ( 0 , 0 , 1 , 0 , . , 0 , 0 , . , . );
array i_ch[9] _temporary_ ( 0 , 1 , 1 , 0 , . , . , . , . , . );
set tmp;
by _groupID _year;
_ls=lag(status);
if first._groupID then do;
ind_up=0;
ind_dn=0;
ind_ch=0;
end;
else do;
_j=whichc(cats(_ls,status), of c[*]);
ind_up=i_up[_j];
ind_dn=i_dn[_j];
ind_ch=i_ch[_j];
end;
run;
The definitions of the three indicators (IND_UP for status increase, IND_DN for status decrease, IND_CH for status change) are provided as array elements (arrays I_UP, I_DN and I_CH, respectively) corresponding to the (presumably) 9 possible combinations of previous status and current status in array C.
Example: '.0' in array C means "previous status missing and current status 0." So, if the missing status is just unknown, the status might have changed (from 1 to 0), but we don't know, hence IND_DN=IND_CH=., whereas IND_UP=0 is for sure.
All three indicators are set to 0 for the first record of each GroupID.
Please feel free to adapt the rules as needed.
If the status only takes values of 1 and 0 as shown perhaps this is sufficient:
data have; input Year_GroupID $ Status; datalines; 1997-01 0 1998-01 0 1999-01 0 2000-01 0 1996-02 0 1997-02 0 1998-02 1 1999-02 1 2002-03 0 2003-03 1 2004-03 1 2005-03 1 2006-03 1 2007-03 0 2008-03 0 ; data want; set have; d = (dif(status) ne 0); d1 = (dif(status) = 1); d2 = (dif(status) = -1); if _n_=1 then d=0; label d = 'Any change' d1 = 'Change 0 to 1' d2 = 'Change 1 to 0' ; run;
Let's dissect the year_groupid into its relevant parts, so we can use by-processing to simplify things:
data have;
input Year_GroupID $ Status;
length
group $2
year 4
;
group = scan(Year_GroupID,2,'-');
year = input(scan(Year_GroupID,1,'-'),4.);
datalines;
1997-01 0
1998-01 0
1999-01 0
2000-01 0
1996-02 0
1997-02 0
1998-02 1
1999-02 1
2002-03 0
2003-03 1
2004-03 1
2005-03 1
2006-03 1
2007-03 0
2008-03 0
;
data want;
set have;
by group;
indicator = (not first.group and lag1(status) ^= status);
run;
proc print data=want noobs;
var year_groupid status indicator;
run;
The result matches your expected result exactly:
Year_ GroupID Status indicator 1997-01 0 0 1998-01 0 0 1999-01 0 0 2000-01 0 0 1996-02 0 0 1997-02 0 0 1998-02 1 1 1999-02 1 0 2002-03 0 0 2003-03 1 1 2004-03 1 0 2005-03 1 0 2006-03 1 0 2007-03 0 1 2008-03 0 0
The additional indicators can be derived with two additional statements:
data want;
set have;
by group;
indicator = (not first.group and lag1(status) ^= status);
ind_up = (indicator and status);
ind_down = (indicator and not status);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.