BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
elbarto
Obsidian | Level 7

I have the following dataset:

 


DATA have;
input year	place_id	firm_id	shock;
DATALINES;
2001	34023	28013	0
2002	34023	28013	1
2003	34023	28013	1
2004	34023	28013	1
2005	34023	28013	0
2006	34023	28013	0
2007	34023	28013	0
2008	34023	28013	0
2009	34023	28013	1
2010	34023	28013	1
1992	46085	28013	1
1993	46085	28013	0
1994	46085	28013	1
1995	46085	28013	1
1996	46085	28013	0
1997	46085	28013	1
1998	46085	28013	0
1999	46085	28013	0
2000	46085	28013	0
2001	46085	28013	1
2002	46085	28013	0
;
RUN;

I want to create another variable called flag to arrive at the following dataset:

 

DATA want;
input year	place_id	firm_id	shock	flag;
DATALINES;
2001	34023	28013	0	0
2002	34023	28013	1	1
2003	34023	28013	1	1
2004	34023	28013	1	1
2005	34023	28013	0	1
2006	34023	28013	0	1
2007	34023	28013	0	0
2008	34023	28013	0	0
2009	34023	28013	1	1
2010	34023	28013	1	1
1992	46085	28013	1	1
1993	46085	28013	0	1
1994	46085	28013	1	1
1995	46085	28013	1	1
1996	46085	28013	0	1
1997	46085	28013	1	1
1998	46085	28013	0	1
1999	46085	28013	0	1
2000	46085	28013	0	0
2001	46085	28013	1	1
2002	46085	28013	0	1
;
RUN;

The rule is that if for a given place_id and firm_id, if in that year shock=1, then flag=1 for the same year as well as the following two years. For example, for year=2004, place_id=34023 and firm_id=28013, we have shock=1, so flag=1 for year=2004 and also takes on the value of 1 for years 2005 and 2006. Similarly, for year=2001, place_id=46085 and firm_id=28013, we have shock=1, so flag=1 for year 2001 and equals 1 for year 2002 as well. Year 2003 does not exist for this year-place_id-firm_id combination so there is no need to create an extra row for year 2003. Furthermore, could the code be general so that the following N years all take a value of 1 for flag. The example shown here is for N=2.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Assuming there is no gap between year.

 

DATA have;
  input year  place_id  firm_id shock flag_expected;
  DATALINES;
2001  34023 28013 0 0
2002  34023 28013 1 1
2003  34023 28013 1 1
2004  34023 28013 1 1
2005  34023 28013 0 1
2006  34023 28013 0 1
2007  34023 28013 0 0
2008  34023 28013 0 0
2009  34023 28013 1 1
2010  34023 28013 1 1
1992  46085 28013 1 1
1993  46085 28013 0 1
1994  46085 28013 1 1
1995  46085 28013 1 1
1996  46085 28013 0 1
1997  46085 28013 1 1
1998  46085 28013 0 1
1999  46085 28013 0 1
2000  46085 28013 0 0
2001  46085 28013 1 1
2002  46085 28013 0 1
;
data want;
 set have;
 want_flag=0;
 if (lag(shock) and place_id=lag(place_id) and firm_id=lag(firm_id)) or
    (lag2(shock) and place_id=lag2(place_id) and firm_id=lag2(firm_id)) or
     shock then want_flag=1;
run;

View solution in original post

3 REPLIES 3
Patrick
Opal | Level 21

Thank you for the data step with sample data AND the desired result. That helps a lot.

DATA have;
  input year  place_id  firm_id shock flag_expected;
  DATALINES;
2001  34023 28013 0 0
2002  34023 28013 1 1
2003  34023 28013 1 1
2004  34023 28013 1 1
2005  34023 28013 0 1
2006  34023 28013 0 1
2007  34023 28013 0 0
2008  34023 28013 0 0
2009  34023 28013 1 1
2010  34023 28013 1 1
1992  46085 28013 1 1
1993  46085 28013 0 1
1994  46085 28013 1 1
1995  46085 28013 1 1
1996  46085 28013 0 1
1997  46085 28013 1 1
1998  46085 28013 0 1
1999  46085 28013 0 1
2000  46085 28013 0 0
2001  46085 28013 1 1
2002  46085 28013 0 1
;

/*The rule is that if for a given place_id and firm_id, */
/*if in that year shock=1, then flag=1 for the same year */
/*as well as the following two years.*/

proc sort data=have out=inter;
  by place_id firm_id year;
run;

data want;
  set have;
  by place_id firm_id;
  retain flag_derived;
  if first.firm_id then flag_derived=0;
  if shock=1 then _year_cnt=3;
  _year_cnt + -1;
  flag_derived= _year_cnt>=0;
  drop _year_cnt;
run;
Ksharp
Super User

Assuming there is no gap between year.

 

DATA have;
  input year  place_id  firm_id shock flag_expected;
  DATALINES;
2001  34023 28013 0 0
2002  34023 28013 1 1
2003  34023 28013 1 1
2004  34023 28013 1 1
2005  34023 28013 0 1
2006  34023 28013 0 1
2007  34023 28013 0 0
2008  34023 28013 0 0
2009  34023 28013 1 1
2010  34023 28013 1 1
1992  46085 28013 1 1
1993  46085 28013 0 1
1994  46085 28013 1 1
1995  46085 28013 1 1
1996  46085 28013 0 1
1997  46085 28013 1 1
1998  46085 28013 0 1
1999  46085 28013 0 1
2000  46085 28013 0 0
2001  46085 28013 1 1
2002  46085 28013 0 1
;
data want;
 set have;
 want_flag=0;
 if (lag(shock) and place_id=lag(place_id) and firm_id=lag(firm_id)) or
    (lag2(shock) and place_id=lag2(place_id) and firm_id=lag2(firm_id)) or
     shock then want_flag=1;
run;
acordes
Rhodochrosite | Level 12

Just for fun solving it with IML

 

DATA have;
  input year  place_id  firm_id shock flag_expected;
  DATALINES;
2001  34023 28013 0 0
2002  34023 28013 1 1
2003  34023 28013 1 1
2004  34023 28013 1 1
2005  34023 28013 0 1
2006  34023 28013 0 1
2007  34023 28013 0 0
2008  34023 28013 0 0
2009  34023 28013 1 1
2010  34023 28013 1 1
1992  46085 28013 1 1
1993  46085 28013 0 1
1994  46085 28013 1 1
1995  46085 28013 1 1
1996  46085 28013 0 1
1997  46085 28013 1 1
1998  46085 28013 0 1
1999  46085 28013 0 1
2000  46085 28013 0 0
2001  46085 28013 1 1
2002  46085 28013 0 1
;


proc iml;
use have;
read all var _NUM_  into x [colname=varname];
close;

flag_control=j(nrow(x),1,0);  /* vector for flag results */   
comp=1:nrow(x);      /* compare vector used by loc-element and xsect function */
u = unique(x[,2]);                 /* get unique values for place_id */

do i = 1 to ncol(u);                 /* iterate over unique values */
idx = loc(x[,2]=u[i]);  
u2=unique(x[idx,3]);              /* get unique values for firm_id */
do j=1 to ncol(u2);
idx2=xsect(idx, loc(x[,3]=u2[j]));

shock_idx=xsect(idx2, loc(x[,4]=1));

flag_control[comp[loc (element(comp, (shock_idx//shock_idx+1//shock_idx+2)))]]=1;
end;
end;

x_ext=x||flag_control;

varname=varname||{'flag_control'};

create want from x_ext [colname=varname];
append from x_ext;
close;

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 696 views
  • 2 likes
  • 4 in conversation