BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
r4321
Pyrite | Level 9

Hello all, 

 

I would like to create a binary indicator variable (0/1) that tells me when/if another variable 'status' changes from year to year within it's groupid. I've created an example below of what I'm thinking: 

 

Year_GroupID    Status                                           Indicator

1997-01                 0                                                     0

1998-01                 0                        -------------->          0

1999-01                 0                                                     0

2000-01                 0                                                     0

1996-02                 0                                                     0

1997-02                 0                                                     0

1998-02                 1                                                     1

1999-02                 1                                                     0

2002-03                 0                                                     0

2003-03                 1                                                     1

2004-03                 1                                                     0

2005-03                 1                                                     0

2006-03                 1                                                     0

2007-03                 0                                                     1

2008-03                 0                                                     0

 

 

Ideally, I would like an indicator variable that tells me if the status column changed at all (i.e. 0 -> 1 or 1 ->0), but also it would be nice to have two other indicators: (1) one that just tells me if the status column changed (0-> 1), and (2) one that just tells me if the status column changed (1 -> 0). 

 

I appreciate the community' help as always. 

 

Thank you in advance!

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @r4321,

 

Try this:

/* Create test data */

data have(drop=i j k l);
length Year_GroupID $8;
do i=0 to 80;
  l=i;
  do j=0 to 3;
    Year_GroupID=cats(1997+j,'-',put(i,z2.));
    k=mod(l,3);
    Status=ifn(k=2,.,k);
    output;
    l=(l-k)/3;
  end;
end;
run;

/* Temporarily provide suitable BY variables */

data tmp / view=tmp;
set have;
length _groupID $8;
_year=input(scan(year_groupID,1,'-'),8.);
_groupID=scan(year_groupID,2,'-');
run;

/* Create indicators */

data want(drop=_:);
array c[9] $2 _temporary_ ('00','01','10','11','.0','.1','0.','1.','..');
array i_up[9] _temporary_ (  0 ,  1 ,  0 ,  0 ,  0 ,  . ,  . ,  0 ,  . );
array i_dn[9] _temporary_ (  0 ,  0 ,  1 ,  0 ,  . ,  0 ,  0 ,  . ,  . );
array i_ch[9] _temporary_ (  0 ,  1 ,  1 ,  0 ,  . ,  . ,  . ,  . ,  . );
set tmp;
by _groupID _year;
_ls=lag(status);
if first._groupID then do;
  ind_up=0;
  ind_dn=0;
  ind_ch=0;
end;
else do;
  _j=whichc(cats(_ls,status), of c[*]);
  ind_up=i_up[_j];
  ind_dn=i_dn[_j];
  ind_ch=i_ch[_j];
end;
run;

The definitions of the three indicators (IND_UP for status increase, IND_DN for status decrease, IND_CH for status change) are provided as array elements (arrays I_UP, I_DN and I_CH, respectively) corresponding to the (presumably) 9 possible combinations of previous status and current status in array C.

 

Example: '.0' in array C means "previous status missing and current status 0." So, if the missing status is just unknown, the status might have changed (from 1 to 0), but we don't know, hence IND_DN=IND_CH=., whereas IND_UP=0 is for sure.

 

All three indicators are set to 0 for the first record of each GroupID.

 

Please feel free to adapt the rules as needed.

View solution in original post

10 REPLIES 10
PeterClemmensen
Tourmaline | Level 20
data have;
input Year_GroupID $ Status;
datalines;
1997-01 0
1998-01 0
1999-01 0
2000-01 0
1996-02 0
1997-02 0
1998-02 1
1999-02 1
2002-03 0
2003-03 1
2004-03 1
2005-03 1
2006-03 1
2007-03 0
2008-03 0
;

data want(drop=lagStatus);
   set have;
   lagStatus=lag1(Status);
   Indicator1=0;Indicator2=0;
   if lagStatus=1 & Status=0 then Indicator1=1;
   else if lagStatus=0 & Status=1 then Indicator2=1;
run;
r4321
Pyrite | Level 9
Thank you.

But unfortunately... this code is not quite giving me what I want. It doesn't account for year_groupid and also It continues to give 1s for the duration of the switch. I only want the indicator to = 1 when the change happens.
VDD
Ammonite | Level 13 VDD
Ammonite | Level 13
data have;
input Year_GroupID $ Status;
datalines;
1997-01 0
1998-01 0
1999-01 0
2000-01 0
1996-02 0
1997-02 0
1998-02 1
1999-02 1
2002-03 0
2003-03 1
2004-03 1
2005-03 1
2006-03 1
2007-03 0
2008-03 0
;


data want;
	set have;
	if lag1(status) ^= status then indicator=1;
	else indicator=0;
run;
r4321
Pyrite | Level 9
Thank you for the response! This doesnt seem to work unfortunately. Seems to arbitrarily put 1s and it also doesn't disregard missing data (e.g., status=.)

r4321
Pyrite | Level 9
Thank you for your help -

Hm... This doesnt seem to work unfortunately. Seems to arbitrarily put 1s and it also doesn't disregard missing data (e.g., status=.)

FreelanceReinh
Jade | Level 19

Hello @r4321,

 

Try this:

/* Create test data */

data have(drop=i j k l);
length Year_GroupID $8;
do i=0 to 80;
  l=i;
  do j=0 to 3;
    Year_GroupID=cats(1997+j,'-',put(i,z2.));
    k=mod(l,3);
    Status=ifn(k=2,.,k);
    output;
    l=(l-k)/3;
  end;
end;
run;

/* Temporarily provide suitable BY variables */

data tmp / view=tmp;
set have;
length _groupID $8;
_year=input(scan(year_groupID,1,'-'),8.);
_groupID=scan(year_groupID,2,'-');
run;

/* Create indicators */

data want(drop=_:);
array c[9] $2 _temporary_ ('00','01','10','11','.0','.1','0.','1.','..');
array i_up[9] _temporary_ (  0 ,  1 ,  0 ,  0 ,  0 ,  . ,  . ,  0 ,  . );
array i_dn[9] _temporary_ (  0 ,  0 ,  1 ,  0 ,  . ,  0 ,  0 ,  . ,  . );
array i_ch[9] _temporary_ (  0 ,  1 ,  1 ,  0 ,  . ,  . ,  . ,  . ,  . );
set tmp;
by _groupID _year;
_ls=lag(status);
if first._groupID then do;
  ind_up=0;
  ind_dn=0;
  ind_ch=0;
end;
else do;
  _j=whichc(cats(_ls,status), of c[*]);
  ind_up=i_up[_j];
  ind_dn=i_dn[_j];
  ind_ch=i_ch[_j];
end;
run;

The definitions of the three indicators (IND_UP for status increase, IND_DN for status decrease, IND_CH for status change) are provided as array elements (arrays I_UP, I_DN and I_CH, respectively) corresponding to the (presumably) 9 possible combinations of previous status and current status in array C.

 

Example: '.0' in array C means "previous status missing and current status 0." So, if the missing status is just unknown, the status might have changed (from 1 to 0), but we don't know, hence IND_DN=IND_CH=., whereas IND_UP=0 is for sure.

 

All three indicators are set to 0 for the first record of each GroupID.

 

Please feel free to adapt the rules as needed.

r4321
Pyrite | Level 9
This works great! Thank you for the code and explanation.
ballardw
Super User

If the status only takes values of 1 and 0 as shown perhaps this is sufficient:

data have;
input Year_GroupID $ Status;
datalines;
1997-01 0
1998-01 0
1999-01 0
2000-01 0
1996-02 0
1997-02 0
1998-02 1
1999-02 1
2002-03 0
2003-03 1
2004-03 1
2005-03 1
2006-03 1
2007-03 0
2008-03 0
;

data want;
   set have;
   d = (dif(status) ne 0);
   d1 = (dif(status) = 1);
   d2 = (dif(status) = -1);
   if _n_=1 then d=0;
   label
      d  = 'Any change'
      d1 = 'Change 0 to 1'
      d2 = 'Change 1 to 0'
   ;
run;

 

Kurt_Bremser
Super User

Let's dissect the year_groupid into its relevant parts, so we can use by-processing to simplify things:

data have;
input Year_GroupID $ Status;
length
  group $2
  year 4
;
group = scan(Year_GroupID,2,'-');
year = input(scan(Year_GroupID,1,'-'),4.);
datalines;
1997-01 0
1998-01 0
1999-01 0
2000-01 0
1996-02 0
1997-02 0
1998-02 1
1999-02 1
2002-03 0
2003-03 1
2004-03 1
2005-03 1
2006-03 1
2007-03 0
2008-03 0
;

data want;
set have;
by group;
indicator = (not first.group and lag1(status) ^= status);
run;

proc print data=want noobs;
var year_groupid status indicator;
run;

The result matches your expected result exactly:

 Year_
GroupID    Status    indicator

1997-01       0          0    
1998-01       0          0    
1999-01       0          0    
2000-01       0          0    
1996-02       0          0    
1997-02       0          0    
1998-02       1          1    
1999-02       1          0    
2002-03       0          0    
2003-03       1          1    
2004-03       1          0    
2005-03       1          0    
2006-03       1          0    
2007-03       0          1    
2008-03       0          0    

The additional indicators can be derived with two additional statements:

data want;
set have;
by group;
indicator = (not first.group and lag1(status) ^= status);
ind_up = (indicator and status);
ind_down = (indicator and not status);
run;
How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2979 views
  • 4 likes
  • 6 in conversation