I have the following dataset:
DATA have;
input year place_id firm_id shock;
DATALINES;
2001 34023 28013 0
2002 34023 28013 1
2003 34023 28013 1
2004 34023 28013 1
2005 34023 28013 0
2006 34023 28013 0
2007 34023 28013 0
2008 34023 28013 0
2009 34023 28013 1
2010 34023 28013 1
1992 46085 28013 1
1993 46085 28013 0
1994 46085 28013 1
1995 46085 28013 1
1996 46085 28013 0
1997 46085 28013 1
1998 46085 28013 0
1999 46085 28013 0
2000 46085 28013 0
2001 46085 28013 1
2002 46085 28013 0
;
RUN;
I want to create another variable called flag to arrive at the following dataset:
DATA want;
input year place_id firm_id shock flag;
DATALINES;
2001 34023 28013 0 0
2002 34023 28013 1 1
2003 34023 28013 1 1
2004 34023 28013 1 1
2005 34023 28013 0 1
2006 34023 28013 0 1
2007 34023 28013 0 0
2008 34023 28013 0 0
2009 34023 28013 1 1
2010 34023 28013 1 1
1992 46085 28013 1 1
1993 46085 28013 0 1
1994 46085 28013 1 1
1995 46085 28013 1 1
1996 46085 28013 0 1
1997 46085 28013 1 1
1998 46085 28013 0 1
1999 46085 28013 0 1
2000 46085 28013 0 0
2001 46085 28013 1 1
2002 46085 28013 0 1
;
RUN;
The rule is that if for a given place_id and firm_id, if in that year shock=1, then flag=1 for the same year as well as the following two years. For example, for year=2004, place_id=34023 and firm_id=28013, we have shock=1, so flag=1 for year=2004 and also takes on the value of 1 for years 2005 and 2006. Similarly, for year=2001, place_id=46085 and firm_id=28013, we have shock=1, so flag=1 for year 2001 and equals 1 for year 2002 as well. Year 2003 does not exist for this year-place_id-firm_id combination so there is no need to create an extra row for year 2003. Furthermore, could the code be general so that the following N years all take a value of 1 for flag. The example shown here is for N=2.
Assuming there is no gap between year.
DATA have; input year place_id firm_id shock flag_expected; DATALINES; 2001 34023 28013 0 0 2002 34023 28013 1 1 2003 34023 28013 1 1 2004 34023 28013 1 1 2005 34023 28013 0 1 2006 34023 28013 0 1 2007 34023 28013 0 0 2008 34023 28013 0 0 2009 34023 28013 1 1 2010 34023 28013 1 1 1992 46085 28013 1 1 1993 46085 28013 0 1 1994 46085 28013 1 1 1995 46085 28013 1 1 1996 46085 28013 0 1 1997 46085 28013 1 1 1998 46085 28013 0 1 1999 46085 28013 0 1 2000 46085 28013 0 0 2001 46085 28013 1 1 2002 46085 28013 0 1 ; data want; set have; want_flag=0; if (lag(shock) and place_id=lag(place_id) and firm_id=lag(firm_id)) or (lag2(shock) and place_id=lag2(place_id) and firm_id=lag2(firm_id)) or shock then want_flag=1; run;
Thank you for the data step with sample data AND the desired result. That helps a lot.
DATA have;
input year place_id firm_id shock flag_expected;
DATALINES;
2001 34023 28013 0 0
2002 34023 28013 1 1
2003 34023 28013 1 1
2004 34023 28013 1 1
2005 34023 28013 0 1
2006 34023 28013 0 1
2007 34023 28013 0 0
2008 34023 28013 0 0
2009 34023 28013 1 1
2010 34023 28013 1 1
1992 46085 28013 1 1
1993 46085 28013 0 1
1994 46085 28013 1 1
1995 46085 28013 1 1
1996 46085 28013 0 1
1997 46085 28013 1 1
1998 46085 28013 0 1
1999 46085 28013 0 1
2000 46085 28013 0 0
2001 46085 28013 1 1
2002 46085 28013 0 1
;
/*The rule is that if for a given place_id and firm_id, */
/*if in that year shock=1, then flag=1 for the same year */
/*as well as the following two years.*/
proc sort data=have out=inter;
by place_id firm_id year;
run;
data want;
set have;
by place_id firm_id;
retain flag_derived;
if first.firm_id then flag_derived=0;
if shock=1 then _year_cnt=3;
_year_cnt + -1;
flag_derived= _year_cnt>=0;
drop _year_cnt;
run;
Assuming there is no gap between year.
DATA have; input year place_id firm_id shock flag_expected; DATALINES; 2001 34023 28013 0 0 2002 34023 28013 1 1 2003 34023 28013 1 1 2004 34023 28013 1 1 2005 34023 28013 0 1 2006 34023 28013 0 1 2007 34023 28013 0 0 2008 34023 28013 0 0 2009 34023 28013 1 1 2010 34023 28013 1 1 1992 46085 28013 1 1 1993 46085 28013 0 1 1994 46085 28013 1 1 1995 46085 28013 1 1 1996 46085 28013 0 1 1997 46085 28013 1 1 1998 46085 28013 0 1 1999 46085 28013 0 1 2000 46085 28013 0 0 2001 46085 28013 1 1 2002 46085 28013 0 1 ; data want; set have; want_flag=0; if (lag(shock) and place_id=lag(place_id) and firm_id=lag(firm_id)) or (lag2(shock) and place_id=lag2(place_id) and firm_id=lag2(firm_id)) or shock then want_flag=1; run;
Just for fun solving it with IML
DATA have;
input year place_id firm_id shock flag_expected;
DATALINES;
2001 34023 28013 0 0
2002 34023 28013 1 1
2003 34023 28013 1 1
2004 34023 28013 1 1
2005 34023 28013 0 1
2006 34023 28013 0 1
2007 34023 28013 0 0
2008 34023 28013 0 0
2009 34023 28013 1 1
2010 34023 28013 1 1
1992 46085 28013 1 1
1993 46085 28013 0 1
1994 46085 28013 1 1
1995 46085 28013 1 1
1996 46085 28013 0 1
1997 46085 28013 1 1
1998 46085 28013 0 1
1999 46085 28013 0 1
2000 46085 28013 0 0
2001 46085 28013 1 1
2002 46085 28013 0 1
;
proc iml;
use have;
read all var _NUM_ into x [colname=varname];
close;
flag_control=j(nrow(x),1,0); /* vector for flag results */
comp=1:nrow(x); /* compare vector used by loc-element and xsect function */
u = unique(x[,2]); /* get unique values for place_id */
do i = 1 to ncol(u); /* iterate over unique values */
idx = loc(x[,2]=u[i]);
u2=unique(x[idx,3]); /* get unique values for firm_id */
do j=1 to ncol(u2);
idx2=xsect(idx, loc(x[,3]=u2[j]));
shock_idx=xsect(idx2, loc(x[,4]=1));
flag_control[comp[loc (element(comp, (shock_idx//shock_idx+1//shock_idx+2)))]]=1;
end;
end;
x_ext=x||flag_control;
varname=varname||{'flag_control'};
create want from x_ext [colname=varname];
append from x_ext;
close;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.