BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
r4321
Pyrite | Level 9

Hello all, 

 

I would like to create a binary indicator variable (0/1) that tells me when/if another variable 'status' changes from year to year within it's groupid. I've created an example below of what I'm thinking: 

 

Year_GroupID    Status                                           Indicator

1997-01                 0                                                     0

1998-01                 0                        -------------->          0

1999-01                 0                                                     0

2000-01                 0                                                     0

1996-02                 0                                                     0

1997-02                 0                                                     0

1998-02                 1                                                     1

1999-02                 1                                                     0

2002-03                 0                                                     0

2003-03                 1                                                     1

2004-03                 1                                                     0

2005-03                 1                                                     0

2006-03                 1                                                     0

2007-03                 0                                                     1

2008-03                 0                                                     0

 

 

Ideally, I would like an indicator variable that tells me if the status column changed at all (i.e. 0 -> 1 or 1 ->0), but also it would be nice to have two other indicators: (1) one that just tells me if the status column changed (0-> 1), and (2) one that just tells me if the status column changed (1 -> 0). 

 

I appreciate the community' help as always. 

 

Thank you in advance!

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @r4321,

 

Try this:

/* Create test data */

data have(drop=i j k l);
length Year_GroupID $8;
do i=0 to 80;
  l=i;
  do j=0 to 3;
    Year_GroupID=cats(1997+j,'-',put(i,z2.));
    k=mod(l,3);
    Status=ifn(k=2,.,k);
    output;
    l=(l-k)/3;
  end;
end;
run;

/* Temporarily provide suitable BY variables */

data tmp / view=tmp;
set have;
length _groupID $8;
_year=input(scan(year_groupID,1,'-'),8.);
_groupID=scan(year_groupID,2,'-');
run;

/* Create indicators */

data want(drop=_:);
array c[9] $2 _temporary_ ('00','01','10','11','.0','.1','0.','1.','..');
array i_up[9] _temporary_ (  0 ,  1 ,  0 ,  0 ,  0 ,  . ,  . ,  0 ,  . );
array i_dn[9] _temporary_ (  0 ,  0 ,  1 ,  0 ,  . ,  0 ,  0 ,  . ,  . );
array i_ch[9] _temporary_ (  0 ,  1 ,  1 ,  0 ,  . ,  . ,  . ,  . ,  . );
set tmp;
by _groupID _year;
_ls=lag(status);
if first._groupID then do;
  ind_up=0;
  ind_dn=0;
  ind_ch=0;
end;
else do;
  _j=whichc(cats(_ls,status), of c[*]);
  ind_up=i_up[_j];
  ind_dn=i_dn[_j];
  ind_ch=i_ch[_j];
end;
run;

The definitions of the three indicators (IND_UP for status increase, IND_DN for status decrease, IND_CH for status change) are provided as array elements (arrays I_UP, I_DN and I_CH, respectively) corresponding to the (presumably) 9 possible combinations of previous status and current status in array C.

 

Example: '.0' in array C means "previous status missing and current status 0." So, if the missing status is just unknown, the status might have changed (from 1 to 0), but we don't know, hence IND_DN=IND_CH=., whereas IND_UP=0 is for sure.

 

All three indicators are set to 0 for the first record of each GroupID.

 

Please feel free to adapt the rules as needed.

View solution in original post

10 REPLIES 10
PeterClemmensen
Tourmaline | Level 20
data have;
input Year_GroupID $ Status;
datalines;
1997-01 0
1998-01 0
1999-01 0
2000-01 0
1996-02 0
1997-02 0
1998-02 1
1999-02 1
2002-03 0
2003-03 1
2004-03 1
2005-03 1
2006-03 1
2007-03 0
2008-03 0
;

data want(drop=lagStatus);
   set have;
   lagStatus=lag1(Status);
   Indicator1=0;Indicator2=0;
   if lagStatus=1 & Status=0 then Indicator1=1;
   else if lagStatus=0 & Status=1 then Indicator2=1;
run;
r4321
Pyrite | Level 9
Thank you.

But unfortunately... this code is not quite giving me what I want. It doesn't account for year_groupid and also It continues to give 1s for the duration of the switch. I only want the indicator to = 1 when the change happens.
VDD
Ammonite | Level 13 VDD
Ammonite | Level 13
data have;
input Year_GroupID $ Status;
datalines;
1997-01 0
1998-01 0
1999-01 0
2000-01 0
1996-02 0
1997-02 0
1998-02 1
1999-02 1
2002-03 0
2003-03 1
2004-03 1
2005-03 1
2006-03 1
2007-03 0
2008-03 0
;


data want;
	set have;
	if lag1(status) ^= status then indicator=1;
	else indicator=0;
run;
r4321
Pyrite | Level 9
Thank you for the response! This doesnt seem to work unfortunately. Seems to arbitrarily put 1s and it also doesn't disregard missing data (e.g., status=.)

r4321
Pyrite | Level 9
Thank you for your help -

Hm... This doesnt seem to work unfortunately. Seems to arbitrarily put 1s and it also doesn't disregard missing data (e.g., status=.)

FreelanceReinh
Jade | Level 19

Hello @r4321,

 

Try this:

/* Create test data */

data have(drop=i j k l);
length Year_GroupID $8;
do i=0 to 80;
  l=i;
  do j=0 to 3;
    Year_GroupID=cats(1997+j,'-',put(i,z2.));
    k=mod(l,3);
    Status=ifn(k=2,.,k);
    output;
    l=(l-k)/3;
  end;
end;
run;

/* Temporarily provide suitable BY variables */

data tmp / view=tmp;
set have;
length _groupID $8;
_year=input(scan(year_groupID,1,'-'),8.);
_groupID=scan(year_groupID,2,'-');
run;

/* Create indicators */

data want(drop=_:);
array c[9] $2 _temporary_ ('00','01','10','11','.0','.1','0.','1.','..');
array i_up[9] _temporary_ (  0 ,  1 ,  0 ,  0 ,  0 ,  . ,  . ,  0 ,  . );
array i_dn[9] _temporary_ (  0 ,  0 ,  1 ,  0 ,  . ,  0 ,  0 ,  . ,  . );
array i_ch[9] _temporary_ (  0 ,  1 ,  1 ,  0 ,  . ,  . ,  . ,  . ,  . );
set tmp;
by _groupID _year;
_ls=lag(status);
if first._groupID then do;
  ind_up=0;
  ind_dn=0;
  ind_ch=0;
end;
else do;
  _j=whichc(cats(_ls,status), of c[*]);
  ind_up=i_up[_j];
  ind_dn=i_dn[_j];
  ind_ch=i_ch[_j];
end;
run;

The definitions of the three indicators (IND_UP for status increase, IND_DN for status decrease, IND_CH for status change) are provided as array elements (arrays I_UP, I_DN and I_CH, respectively) corresponding to the (presumably) 9 possible combinations of previous status and current status in array C.

 

Example: '.0' in array C means "previous status missing and current status 0." So, if the missing status is just unknown, the status might have changed (from 1 to 0), but we don't know, hence IND_DN=IND_CH=., whereas IND_UP=0 is for sure.

 

All three indicators are set to 0 for the first record of each GroupID.

 

Please feel free to adapt the rules as needed.

r4321
Pyrite | Level 9
This works great! Thank you for the code and explanation.
ballardw
Super User

If the status only takes values of 1 and 0 as shown perhaps this is sufficient:

data have;
input Year_GroupID $ Status;
datalines;
1997-01 0
1998-01 0
1999-01 0
2000-01 0
1996-02 0
1997-02 0
1998-02 1
1999-02 1
2002-03 0
2003-03 1
2004-03 1
2005-03 1
2006-03 1
2007-03 0
2008-03 0
;

data want;
   set have;
   d = (dif(status) ne 0);
   d1 = (dif(status) = 1);
   d2 = (dif(status) = -1);
   if _n_=1 then d=0;
   label
      d  = 'Any change'
      d1 = 'Change 0 to 1'
      d2 = 'Change 1 to 0'
   ;
run;

 

Kurt_Bremser
Super User

Let's dissect the year_groupid into its relevant parts, so we can use by-processing to simplify things:

data have;
input Year_GroupID $ Status;
length
  group $2
  year 4
;
group = scan(Year_GroupID,2,'-');
year = input(scan(Year_GroupID,1,'-'),4.);
datalines;
1997-01 0
1998-01 0
1999-01 0
2000-01 0
1996-02 0
1997-02 0
1998-02 1
1999-02 1
2002-03 0
2003-03 1
2004-03 1
2005-03 1
2006-03 1
2007-03 0
2008-03 0
;

data want;
set have;
by group;
indicator = (not first.group and lag1(status) ^= status);
run;

proc print data=want noobs;
var year_groupid status indicator;
run;

The result matches your expected result exactly:

 Year_
GroupID    Status    indicator

1997-01       0          0    
1998-01       0          0    
1999-01       0          0    
2000-01       0          0    
1996-02       0          0    
1997-02       0          0    
1998-02       1          1    
1999-02       1          0    
2002-03       0          0    
2003-03       1          1    
2004-03       1          0    
2005-03       1          0    
2006-03       1          0    
2007-03       0          1    
2008-03       0          0    

The additional indicators can be derived with two additional statements:

data want;
set have;
by group;
indicator = (not first.group and lag1(status) ^= status);
ind_up = (indicator and status);
ind_down = (indicator and not status);
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1525 views
  • 4 likes
  • 6 in conversation