I have the following dataset:
DATA have;
input year place_id firm_id shock;
DATALINES;
2001 34023 28013 0
2002 34023 28013 1
2003 34023 28013 1
2004 34023 28013 1
2005 34023 28013 0
2006 34023 28013 0
2007 34023 28013 0
2008 34023 28013 0
2009 34023 28013 1
2010 34023 28013 1
1992 46085 28013 1
1993 46085 28013 0
1994 46085 28013 1
1995 46085 28013 1
1996 46085 28013 0
1997 46085 28013 1
1998 46085 28013 0
1999 46085 28013 0
2000 46085 28013 0
2001 46085 28013 1
2002 46085 28013 0
;
RUN;I want to create another variable called flag to arrive at the following dataset:
DATA want;
input year place_id firm_id shock flag;
DATALINES;
2001 34023 28013 0 0
2002 34023 28013 1 1
2003 34023 28013 1 1
2004 34023 28013 1 1
2005 34023 28013 0 1
2006 34023 28013 0 1
2007 34023 28013 0 0
2008 34023 28013 0 0
2009 34023 28013 1 1
2010 34023 28013 1 1
1992 46085 28013 1 1
1993 46085 28013 0 1
1994 46085 28013 1 1
1995 46085 28013 1 1
1996 46085 28013 0 1
1997 46085 28013 1 1
1998 46085 28013 0 1
1999 46085 28013 0 1
2000 46085 28013 0 0
2001 46085 28013 1 1
2002 46085 28013 0 1
;
RUN;The rule is that if for a given place_id and firm_id, if in that year shock=1, then flag=1 for the same year as well as the following two years. For example, for year=2004, place_id=34023 and firm_id=28013, we have shock=1, so flag=1 for year=2004 and also takes on the value of 1 for years 2005 and 2006. Similarly, for year=2001, place_id=46085 and firm_id=28013, we have shock=1, so flag=1 for year 2001 and equals 1 for year 2002 as well. Year 2003 does not exist for this year-place_id-firm_id combination so there is no need to create an extra row for year 2003. Furthermore, could the code be general so that the following N years all take a value of 1 for flag. The example shown here is for N=2.
Assuming there is no gap between year.
DATA have;
input year place_id firm_id shock flag_expected;
DATALINES;
2001 34023 28013 0 0
2002 34023 28013 1 1
2003 34023 28013 1 1
2004 34023 28013 1 1
2005 34023 28013 0 1
2006 34023 28013 0 1
2007 34023 28013 0 0
2008 34023 28013 0 0
2009 34023 28013 1 1
2010 34023 28013 1 1
1992 46085 28013 1 1
1993 46085 28013 0 1
1994 46085 28013 1 1
1995 46085 28013 1 1
1996 46085 28013 0 1
1997 46085 28013 1 1
1998 46085 28013 0 1
1999 46085 28013 0 1
2000 46085 28013 0 0
2001 46085 28013 1 1
2002 46085 28013 0 1
;
data want;
set have;
want_flag=0;
if (lag(shock) and place_id=lag(place_id) and firm_id=lag(firm_id)) or
(lag2(shock) and place_id=lag2(place_id) and firm_id=lag2(firm_id)) or
shock then want_flag=1;
run;
Thank you for the data step with sample data AND the desired result. That helps a lot.
DATA have;
input year place_id firm_id shock flag_expected;
DATALINES;
2001 34023 28013 0 0
2002 34023 28013 1 1
2003 34023 28013 1 1
2004 34023 28013 1 1
2005 34023 28013 0 1
2006 34023 28013 0 1
2007 34023 28013 0 0
2008 34023 28013 0 0
2009 34023 28013 1 1
2010 34023 28013 1 1
1992 46085 28013 1 1
1993 46085 28013 0 1
1994 46085 28013 1 1
1995 46085 28013 1 1
1996 46085 28013 0 1
1997 46085 28013 1 1
1998 46085 28013 0 1
1999 46085 28013 0 1
2000 46085 28013 0 0
2001 46085 28013 1 1
2002 46085 28013 0 1
;
/*The rule is that if for a given place_id and firm_id, */
/*if in that year shock=1, then flag=1 for the same year */
/*as well as the following two years.*/
proc sort data=have out=inter;
by place_id firm_id year;
run;
data want;
set have;
by place_id firm_id;
retain flag_derived;
if first.firm_id then flag_derived=0;
if shock=1 then _year_cnt=3;
_year_cnt + -1;
flag_derived= _year_cnt>=0;
drop _year_cnt;
run;
Assuming there is no gap between year.
DATA have;
input year place_id firm_id shock flag_expected;
DATALINES;
2001 34023 28013 0 0
2002 34023 28013 1 1
2003 34023 28013 1 1
2004 34023 28013 1 1
2005 34023 28013 0 1
2006 34023 28013 0 1
2007 34023 28013 0 0
2008 34023 28013 0 0
2009 34023 28013 1 1
2010 34023 28013 1 1
1992 46085 28013 1 1
1993 46085 28013 0 1
1994 46085 28013 1 1
1995 46085 28013 1 1
1996 46085 28013 0 1
1997 46085 28013 1 1
1998 46085 28013 0 1
1999 46085 28013 0 1
2000 46085 28013 0 0
2001 46085 28013 1 1
2002 46085 28013 0 1
;
data want;
set have;
want_flag=0;
if (lag(shock) and place_id=lag(place_id) and firm_id=lag(firm_id)) or
(lag2(shock) and place_id=lag2(place_id) and firm_id=lag2(firm_id)) or
shock then want_flag=1;
run;
Just for fun solving it with IML
DATA have;
input year place_id firm_id shock flag_expected;
DATALINES;
2001 34023 28013 0 0
2002 34023 28013 1 1
2003 34023 28013 1 1
2004 34023 28013 1 1
2005 34023 28013 0 1
2006 34023 28013 0 1
2007 34023 28013 0 0
2008 34023 28013 0 0
2009 34023 28013 1 1
2010 34023 28013 1 1
1992 46085 28013 1 1
1993 46085 28013 0 1
1994 46085 28013 1 1
1995 46085 28013 1 1
1996 46085 28013 0 1
1997 46085 28013 1 1
1998 46085 28013 0 1
1999 46085 28013 0 1
2000 46085 28013 0 0
2001 46085 28013 1 1
2002 46085 28013 0 1
;
proc iml;
use have;
read all var _NUM_ into x [colname=varname];
close;
flag_control=j(nrow(x),1,0); /* vector for flag results */
comp=1:nrow(x); /* compare vector used by loc-element and xsect function */
u = unique(x[,2]); /* get unique values for place_id */
do i = 1 to ncol(u); /* iterate over unique values */
idx = loc(x[,2]=u[i]);
u2=unique(x[idx,3]); /* get unique values for firm_id */
do j=1 to ncol(u2);
idx2=xsect(idx, loc(x[,3]=u2[j]));
shock_idx=xsect(idx2, loc(x[,4]=1));
flag_control[comp[loc (element(comp, (shock_idx//shock_idx+1//shock_idx+2)))]]=1;
end;
end;
x_ext=x||flag_control;
varname=varname||{'flag_control'};
create want from x_ext [colname=varname];
append from x_ext;
close;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.