Solved: Changing subsequent values based on previous value

elbarto · Posted 10-02-2021 11:48 PM

I have the following dataset:


DATA have;
input year	place_id	firm_id	shock;
DATALINES;
2001	34023	28013	0
2002	34023	28013	1
2003	34023	28013	1
2004	34023	28013	1
2005	34023	28013	0
2006	34023	28013	0
2007	34023	28013	0
2008	34023	28013	0
2009	34023	28013	1
2010	34023	28013	1
1992	46085	28013	1
1993	46085	28013	0
1994	46085	28013	1
1995	46085	28013	1
1996	46085	28013	0
1997	46085	28013	1
1998	46085	28013	0
1999	46085	28013	0
2000	46085	28013	0
2001	46085	28013	1
2002	46085	28013	0
;
RUN;

I want to create another variable called flag to arrive at the following dataset:

DATA want;
input year	place_id	firm_id	shock	flag;
DATALINES;
2001	34023	28013	0	0
2002	34023	28013	1	1
2003	34023	28013	1	1
2004	34023	28013	1	1
2005	34023	28013	0	1
2006	34023	28013	0	1
2007	34023	28013	0	0
2008	34023	28013	0	0
2009	34023	28013	1	1
2010	34023	28013	1	1
1992	46085	28013	1	1
1993	46085	28013	0	1
1994	46085	28013	1	1
1995	46085	28013	1	1
1996	46085	28013	0	1
1997	46085	28013	1	1
1998	46085	28013	0	1
1999	46085	28013	0	1
2000	46085	28013	0	0
2001	46085	28013	1	1
2002	46085	28013	0	1
;
RUN;

The rule is that if for a given place_id and firm_id, if in that year shock=1, then flag=1 for the same year as well as the following two years. For example, for year=2004, place_id=34023 and firm_id=28013, we have shock=1, so flag=1 for year=2004 and also takes on the value of 1 for years 2005 and 2006. Similarly, for year=2001, place_id=46085 and firm_id=28013, we have shock=1, so flag=1 for year 2001 and equals 1 for year 2002 as well. Year 2003 does not exist for this year-place_id-firm_id combination so there is no need to create an extra row for year 2003. Furthermore, could the code be general so that the following N years all take a value of 1 for flag. The example shown here is for N=2.

Ksharp · Posted 10-03-2021 07:31 AM

Assuming there is no gap between year.

DATA have;
  input year  place_id  firm_id shock flag_expected;
  DATALINES;
2001  34023 28013 0 0
2002  34023 28013 1 1
2003  34023 28013 1 1
2004  34023 28013 1 1
2005  34023 28013 0 1
2006  34023 28013 0 1
2007  34023 28013 0 0
2008  34023 28013 0 0
2009  34023 28013 1 1
2010  34023 28013 1 1
1992  46085 28013 1 1
1993  46085 28013 0 1
1994  46085 28013 1 1
1995  46085 28013 1 1
1996  46085 28013 0 1
1997  46085 28013 1 1
1998  46085 28013 0 1
1999  46085 28013 0 1
2000  46085 28013 0 0
2001  46085 28013 1 1
2002  46085 28013 0 1
;
data want;
 set have;
 want_flag=0;
 if (lag(shock) and place_id=lag(place_id) and firm_id=lag(firm_id)) or
    (lag2(shock) and place_id=lag2(place_id) and firm_id=lag2(firm_id)) or
     shock then want_flag=1;
run;

View solution in original post

Patrick · Posted 10-03-2021 01:58 AM

Thank you for the data step with sample data AND the desired result. That helps a lot.

DATA have;
  input year  place_id  firm_id shock flag_expected;
  DATALINES;
2001  34023 28013 0 0
2002  34023 28013 1 1
2003  34023 28013 1 1
2004  34023 28013 1 1
2005  34023 28013 0 1
2006  34023 28013 0 1
2007  34023 28013 0 0
2008  34023 28013 0 0
2009  34023 28013 1 1
2010  34023 28013 1 1
1992  46085 28013 1 1
1993  46085 28013 0 1
1994  46085 28013 1 1
1995  46085 28013 1 1
1996  46085 28013 0 1
1997  46085 28013 1 1
1998  46085 28013 0 1
1999  46085 28013 0 1
2000  46085 28013 0 0
2001  46085 28013 1 1
2002  46085 28013 0 1
;

/*The rule is that if for a given place_id and firm_id, */
/*if in that year shock=1, then flag=1 for the same year */
/*as well as the following two years.*/

proc sort data=have out=inter;
  by place_id firm_id year;
run;

data want;
  set have;
  by place_id firm_id;
  retain flag_derived;
  if first.firm_id then flag_derived=0;
  if shock=1 then _year_cnt=3;
  _year_cnt + -1;
  flag_derived= _year_cnt>=0;
  drop _year_cnt;
run;

Ksharp · Posted 10-03-2021 07:31 AM

Assuming there is no gap between year.

DATA have;
  input year  place_id  firm_id shock flag_expected;
  DATALINES;
2001  34023 28013 0 0
2002  34023 28013 1 1
2003  34023 28013 1 1
2004  34023 28013 1 1
2005  34023 28013 0 1
2006  34023 28013 0 1
2007  34023 28013 0 0
2008  34023 28013 0 0
2009  34023 28013 1 1
2010  34023 28013 1 1
1992  46085 28013 1 1
1993  46085 28013 0 1
1994  46085 28013 1 1
1995  46085 28013 1 1
1996  46085 28013 0 1
1997  46085 28013 1 1
1998  46085 28013 0 1
1999  46085 28013 0 1
2000  46085 28013 0 0
2001  46085 28013 1 1
2002  46085 28013 0 1
;
data want;
 set have;
 want_flag=0;
 if (lag(shock) and place_id=lag(place_id) and firm_id=lag(firm_id)) or
    (lag2(shock) and place_id=lag2(place_id) and firm_id=lag2(firm_id)) or
     shock then want_flag=1;
run;

acordes · Posted 10-03-2021 02:20 PM

Just for fun solving it with IML

DATA have;
  input year  place_id  firm_id shock flag_expected;
  DATALINES;
2001  34023 28013 0 0
2002  34023 28013 1 1
2003  34023 28013 1 1
2004  34023 28013 1 1
2005  34023 28013 0 1
2006  34023 28013 0 1
2007  34023 28013 0 0
2008  34023 28013 0 0
2009  34023 28013 1 1
2010  34023 28013 1 1
1992  46085 28013 1 1
1993  46085 28013 0 1
1994  46085 28013 1 1
1995  46085 28013 1 1
1996  46085 28013 0 1
1997  46085 28013 1 1
1998  46085 28013 0 1
1999  46085 28013 0 1
2000  46085 28013 0 0
2001  46085 28013 1 1
2002  46085 28013 0 1
;


proc iml;
use have;
read all var _NUM_  into x [colname=varname];
close;

flag_control=j(nrow(x),1,0);  /* vector for flag results */   
comp=1:nrow(x);      /* compare vector used by loc-element and xsect function */
u = unique(x[,2]);                 /* get unique values for place_id */

do i = 1 to ncol(u);                 /* iterate over unique values */
idx = loc(x[,2]=u[i]);  
u2=unique(x[idx,3]);              /* get unique values for firm_id */
do j=1 to ncol(u2);
idx2=xsect(idx, loc(x[,3]=u2[j]));

shock_idx=xsect(idx2, loc(x[,4]=1));

flag_control[comp[loc (element(comp, (shock_idx//shock_idx+1//shock_idx+2)))]]=1;
end;
end;

x_ext=x||flag_control;

varname=varname||{'flag_control'};

create want from x_ext [colname=varname];
append from x_ext;
close;

Changing subsequent values based on previous value

Re: Changing subsequent values based on previous value

Re: Changing subsequent values based on previous value

Re: Changing subsequent values based on previous value

Re: Changing subsequent values based on previous value

Changing subsequent values based on previous value

Re: Changing subsequent values based on previous value

Re: Changing subsequent values based on previous value

Re: Changing subsequent values based on previous value

Re: Changing subsequent values based on previous value

Registration is open

SAS Training: Just a Click Away