I have the following dataset:
DATA have;
input year place_id firm_id shock;
DATALINES;
2001 34023 28013 0
2002 34023 28013 1
2003 34023 28013 1
2004 34023 28013 1
2005 34023 28013 0
2006 34023 28013 0
2007 34023 28013 0
2008 34023 28013 0
2009 34023 28013 1
2010 34023 28013 1
1992 46085 28013 1
1993 46085 28013 0
1994 46085 28013 1
1995 46085 28013 1
1996 46085 28013 0
1997 46085 28013 1
1998 46085 28013 0
1999 46085 28013 0
2000 46085 28013 0
2001 46085 28013 1
2002 46085 28013 0
;
RUN;
I want to create another variable called flag to arrive at the following dataset:
DATA want;
input year place_id firm_id shock flag;
DATALINES;
2001 34023 28013 0 0
2002 34023 28013 1 1
2003 34023 28013 1 1
2004 34023 28013 1 1
2005 34023 28013 0 1
2006 34023 28013 0 1
2007 34023 28013 0 0
2008 34023 28013 0 0
2009 34023 28013 1 1
2010 34023 28013 1 1
1992 46085 28013 1 1
1993 46085 28013 0 1
1994 46085 28013 1 1
1995 46085 28013 1 1
1996 46085 28013 0 1
1997 46085 28013 1 1
1998 46085 28013 0 1
1999 46085 28013 0 1
2000 46085 28013 0 0
2001 46085 28013 1 1
2002 46085 28013 0 1
;
RUN;
The rule is that if for a given place_id and firm_id, if in that year shock=1, then flag=1 for the same year as well as the following two years. For example, for year=2004, place_id=34023 and firm_id=28013, we have shock=1, so flag=1 for year=2004 and also takes on the value of 1 for years 2005 and 2006. Similarly, for year=2001, place_id=46085 and firm_id=28013, we have shock=1, so flag=1 for year 2001 and equals 1 for year 2002 as well. Year 2003 does not exist for this year-place_id-firm_id combination so there is no need to create an extra row for year 2003. Furthermore, could the code be general so that the following N years all take a value of 1 for flag. The example shown here is for N=2.
Assuming there is no gap between year.
DATA have; input year place_id firm_id shock flag_expected; DATALINES; 2001 34023 28013 0 0 2002 34023 28013 1 1 2003 34023 28013 1 1 2004 34023 28013 1 1 2005 34023 28013 0 1 2006 34023 28013 0 1 2007 34023 28013 0 0 2008 34023 28013 0 0 2009 34023 28013 1 1 2010 34023 28013 1 1 1992 46085 28013 1 1 1993 46085 28013 0 1 1994 46085 28013 1 1 1995 46085 28013 1 1 1996 46085 28013 0 1 1997 46085 28013 1 1 1998 46085 28013 0 1 1999 46085 28013 0 1 2000 46085 28013 0 0 2001 46085 28013 1 1 2002 46085 28013 0 1 ; data want; set have; want_flag=0; if (lag(shock) and place_id=lag(place_id) and firm_id=lag(firm_id)) or (lag2(shock) and place_id=lag2(place_id) and firm_id=lag2(firm_id)) or shock then want_flag=1; run;
Thank you for the data step with sample data AND the desired result. That helps a lot.
DATA have;
input year place_id firm_id shock flag_expected;
DATALINES;
2001 34023 28013 0 0
2002 34023 28013 1 1
2003 34023 28013 1 1
2004 34023 28013 1 1
2005 34023 28013 0 1
2006 34023 28013 0 1
2007 34023 28013 0 0
2008 34023 28013 0 0
2009 34023 28013 1 1
2010 34023 28013 1 1
1992 46085 28013 1 1
1993 46085 28013 0 1
1994 46085 28013 1 1
1995 46085 28013 1 1
1996 46085 28013 0 1
1997 46085 28013 1 1
1998 46085 28013 0 1
1999 46085 28013 0 1
2000 46085 28013 0 0
2001 46085 28013 1 1
2002 46085 28013 0 1
;
/*The rule is that if for a given place_id and firm_id, */
/*if in that year shock=1, then flag=1 for the same year */
/*as well as the following two years.*/
proc sort data=have out=inter;
by place_id firm_id year;
run;
data want;
set have;
by place_id firm_id;
retain flag_derived;
if first.firm_id then flag_derived=0;
if shock=1 then _year_cnt=3;
_year_cnt + -1;
flag_derived= _year_cnt>=0;
drop _year_cnt;
run;
Assuming there is no gap between year.
DATA have; input year place_id firm_id shock flag_expected; DATALINES; 2001 34023 28013 0 0 2002 34023 28013 1 1 2003 34023 28013 1 1 2004 34023 28013 1 1 2005 34023 28013 0 1 2006 34023 28013 0 1 2007 34023 28013 0 0 2008 34023 28013 0 0 2009 34023 28013 1 1 2010 34023 28013 1 1 1992 46085 28013 1 1 1993 46085 28013 0 1 1994 46085 28013 1 1 1995 46085 28013 1 1 1996 46085 28013 0 1 1997 46085 28013 1 1 1998 46085 28013 0 1 1999 46085 28013 0 1 2000 46085 28013 0 0 2001 46085 28013 1 1 2002 46085 28013 0 1 ; data want; set have; want_flag=0; if (lag(shock) and place_id=lag(place_id) and firm_id=lag(firm_id)) or (lag2(shock) and place_id=lag2(place_id) and firm_id=lag2(firm_id)) or shock then want_flag=1; run;
Just for fun solving it with IML
DATA have;
input year place_id firm_id shock flag_expected;
DATALINES;
2001 34023 28013 0 0
2002 34023 28013 1 1
2003 34023 28013 1 1
2004 34023 28013 1 1
2005 34023 28013 0 1
2006 34023 28013 0 1
2007 34023 28013 0 0
2008 34023 28013 0 0
2009 34023 28013 1 1
2010 34023 28013 1 1
1992 46085 28013 1 1
1993 46085 28013 0 1
1994 46085 28013 1 1
1995 46085 28013 1 1
1996 46085 28013 0 1
1997 46085 28013 1 1
1998 46085 28013 0 1
1999 46085 28013 0 1
2000 46085 28013 0 0
2001 46085 28013 1 1
2002 46085 28013 0 1
;
proc iml;
use have;
read all var _NUM_ into x [colname=varname];
close;
flag_control=j(nrow(x),1,0); /* vector for flag results */
comp=1:nrow(x); /* compare vector used by loc-element and xsect function */
u = unique(x[,2]); /* get unique values for place_id */
do i = 1 to ncol(u); /* iterate over unique values */
idx = loc(x[,2]=u[i]);
u2=unique(x[idx,3]); /* get unique values for firm_id */
do j=1 to ncol(u2);
idx2=xsect(idx, loc(x[,3]=u2[j]));
shock_idx=xsect(idx2, loc(x[,4]=1));
flag_control[comp[loc (element(comp, (shock_idx//shock_idx+1//shock_idx+2)))]]=1;
end;
end;
x_ext=x||flag_control;
varname=varname||{'flag_control'};
create want from x_ext [colname=varname];
append from x_ext;
close;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.