Dear SAS support community,
I want to keep observations with AT LEAST three years before with dummy variable =0, and AT LEAST three years after with Dummy variable =1. If there are more than three years of observations with Dummy =0 before certain year and more than three years of observations with Dummy =1 after, it is fine as well. The data sample is as follows:
Left: Data Have Right: Data Want
Gvkey | year | Dummy | Gvkey | year | Dummy | |
AAA | 1991 | 0 | AAA | 1991 | 0 | |
AAA | 1992 | 0 | AAA | 1992 | 0 | |
AAA | 1993 | 0 | AAA | 1993 | 0 | |
AAA | 1994 | 0 | AAA | 1994 | 0 | |
AAA | 1995 | 1 | AAA | 1995 | 1 | |
AAA | 1996 | 1 | AAA | 1996 | 1 | |
AAA | 1997 | 1 | AAA | 1997 | 1 | |
AAA | 1998 | 1 | AAA | 1998 | 1 | |
AAA | 1999 | 1 | AAA | 1999 | 1 | |
AAA | 2000 | 0 | ||||
AAA | 2001 | 1 | ||||
AAA | 2002 | 0 | ||||
AAA | 2003 | 0 | ||||
AAA | 2004 | 1 |
I tried lead and lag, and also loop, but doesn't seem to work. Thank you so much! Any suggestion is appreciated!
I think SQL will provide an easy answer to your problem. Something like this should work:
proc sql;
create table want as
select A.*
,B.Year_Min
from have as A
inner join
(select Gvkey
,min(year) as Year_Min
from have
where Dummy = 1
group by Gvkey
) as B
on A.Gvkey = B.Gvkey
where abs(A.year - B.Year_Min) <= 4
;
quit;
I think SQL will provide an easy answer to your problem. Something like this should work:
proc sql;
create table want as
select A.*
,B.Year_Min
from have as A
inner join
(select Gvkey
,min(year) as Year_Min
from have
where Dummy = 1
group by Gvkey
) as B
on A.Gvkey = B.Gvkey
where abs(A.year - B.Year_Min) <= 4
;
quit;
Thank you! One more question: Is there a way to full fill both conditions? I saw your solution only meets the "at least four years of dummy = 1 observations".
I was trying something like this:
proc sql;
create table want1 as select A.*
from have as A inner join
(select gvkey,min(year) as Year_Min from have where dummy= 0
group by gvkey ) as B
on A.gvkey= B.gvkey
where abs(A.year - B.Year_Min) <= 3;
quit;
Then I can maybe inner join want and want1 tables?
More specifically the data should be something like this:
data have:
key | year | Dummy | key | year | Dummy | |
AAA | 1988 | 0 | AAA | 1991 | 0 | |
AAA | 1989 | 1 | AAA | 1992 | 0 | |
AAA | 1990 | 1 | AAA | 1993 | 0 | |
AAA | 1991 | 0 | AAA | 1994 | 0 | |
AAA | 1992 | 0 | AAA | 1995 | 1 | |
AAA | 1993 | 0 | AAA | 1996 | 1 | |
AAA | 1994 | 0 | AAA | 1997 | 1 | |
AAA | 1995 | 1 | AAA | 1998 | 1 | |
AAA | 1996 | 1 | AAA | 1999 | 1 | |
AAA | 1997 | 1 | ||||
AAA | 1998 | 1 | ||||
AAA | 1999 | 1 | ||||
AAA | 2000 | 0 | ||||
AAA | 2001 | 1 | ||||
AAA | 2002 | 0 | ||||
AAA | 2003 | 0 | ||||
AAA | 2004 | 1 |
How about something like this?
proc sql;
create table want as
select A.*
,B.Year_Min
,C.Year_Max
from have as A
left join
(select Gvkey
,min(year) as Year_Min
from have
where Dummy = 1
group by Gvkey
) as B
on A.Gvkey = B.Gvkey
left join
(select Gvkey
,max(year) as Year_Max
from have
where Dummy = 0
group by Gvkey
) as C
on A.Gvkey = C.Gvkey
where abs(A.year - B.Year_Min) <= 4
and abs(A.year - C.Year_Max) <= 4
;
quit;
data have;
input Gvkey $ year Dummy ;* Gvkey year Dummy ;
cards;
AAA 1991 0 AAA 1991 0
AAA 1992 0 AAA 1992 0
AAA 1993 0 AAA 1993 0
AAA 1994 0 AAA 1994 0
AAA 1995 1 AAA 1995 1
AAA 1996 1 AAA 1996 1
AAA 1997 1 AAA 1997 1
AAA 1998 1 AAA 1998 1
AAA 1999 1 AAA 1999 1
AAA 2000 0
AAA 2001 1
AAA 2002 0
AAA 2003 0
AAA 2004 1
;
data temp;
if _n_=1 then do;
if 0 then set have(rename=(year=_year dummy=_d));
dcl hash H (dataset:'have(rename=(year=_year dummy=_d))',ordered: "A") ;
h.definekey ("gvkey","_year") ;
h.definedata ("_year","_d") ;
h.definedone () ;
dcl hiter hi('h');
end;
n=0;
do until(last.year);
set have;
by gvkey year;
do rc=hi.setcur(key: gvkey,key:year) by 0 while(rc=0);
n+1;
if dummy=0 then do;
rc=hi.prev();
if dummy ne _d then leave;
end;
else do;
rc=hi.next();
if dummy ne _d then leave;
end;
end;
end;
do until(last.year);
set have;
by gvkey year;
if n>=4 then do;
do rc=hi.setcur(key: gvkey,key:year) by 0 while(rc=0);
output;
if dummy=0 then do;
rc=hi.prev();
if dummy ne _d then leave;
end;
else do;
rc=hi.next();
if dummy ne _d then leave;
end;
end;
end;
end;
keep gvkey _year dummy;
run;
proc sort data=temp out =want(rename=_year=year) nodupkey;
by gvkey _year dummy;
run;
data have;
input Gvkey $ year Dummy ;* Gvkey year Dummy ;
cards;
AAA 1991 0 AAA 1991 0
AAA 1992 0 AAA 1992 0
AAA 1993 0 AAA 1993 0
AAA 1994 0 AAA 1994 0
AAA 1995 1 AAA 1995 1
AAA 1996 1 AAA 1996 1
AAA 1997 1 AAA 1997 1
AAA 1998 1 AAA 1998 1
AAA 1999 1 AAA 1999 1
AAA 2000 0
AAA 2001 1
AAA 2002 0
AAA 2003 0
AAA 2004 1
;
proc sort data=have out=temp1;
by gvkey dummy year;
run;
data temp2;
set temp1;
by gvkey dummy ;
if dif(year) ne 1 the grp+1;
if first.dummy then grp=1;
run;
proc sql;
create table want(drop=grp) as
select *
from temp2
group by gvkey,dummy ,grp
having count(grp)>=4
order by gvkey,dummy, year;
quit;
data have;
input Gvkey $ year Dummy ;* Gvkey year Dummy ;
cards;
AAA 1988 0 AAA 1991 0
AAA 1989 1 AAA 1992 0
AAA 1990 1 AAA 1993 0
AAA 1991 0 AAA 1994 0
AAA 1992 0 AAA 1995 1
AAA 1993 0 AAA 1996 1
AAA 1994 0 AAA 1997 1
AAA 1995 1 AAA 1998 1
AAA 1996 1 AAA 1999 1
AAA 1997 1 AAA 2002 0
AAA 1998 1 AAA 2003 0
AAA 1999 1 AAA 2004 0
AAA 2000 0 AAA 2005 0
AAA 2001 1 AAA 2006 1
AAA 2002 0 AAA 2007 1
AAA 2003 0 AAA 2008 1
AAA 2004 0 AAA 2009 1
AAA 2005 0 AAA 2010 1
AAA 2006 1
AAA 2007 1
AAA 2008 1
AAA 2009 1
AAA 2010 1
AAA 2011 0
;
proc sort data=have out=temp1;
by gvkey dummy year;
run;
data want;
if _n_=1 then do;
dcl hash H (ordered: "A") ;
h.definekey ("gvkey","year") ;
h.definedata ("gvkey","year", "Dummy") ;
h.definedone () ;
dcl hiter hi('h');
end;
do until(last.gvkey);
set temp1;
by gvkey dummy ;
array t(9999) _temporary_;
_year=year;
if dif(_year) ne 1 then do;
grp+1;
count=1;
end;
else count+1;
if count=1 and n(of t(*))>=4 then do;
do year=min(of t(*)) to max(of t(*));
rc=h.add();
end;
call missing(of t(*));
end;
t(count)=_year;
if first.dummy then grp=1;
end;
if count>=4 then do year=min(of t(*)) to max(of t(*));
rc=h.add();
end;
do while(hi.next()=0);
output;
end;
h.clear();
call missing(of t(*),count,grp);
keep gvkey year Dummy;
run;
All of your suggestions are very helpful and I appreciate it!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.