Hello SAS support community,
I want to have a data set where there are at least three consecutive years of data before one certain year and at least three consecutive years of data after this certain year. The sample data looks like this:
key | year | Dummy | key | year | Dummy | |
AAA | 1988 | 0 | AAA | 1991 | 0 | |
AAA | 1989 | 1 | AAA | 1992 | 0 | |
AAA | 1990 | 1 | AAA | 1993 | 0 | |
AAA | 1991 | 0 | AAA | 1994 | 0 | |
AAA | 1992 | 0 | AAA | 1995 | 1 | |
AAA | 1993 | 0 | AAA | 1996 | 1 | |
AAA | 1994 | 0 | AAA | 1997 | 1 | |
AAA | 1995 | 1 | AAA | 1998 | 1 | |
AAA | 1996 | 1 | AAA | 1999 | 1 | |
AAA | 1997 | 1 | AAA | 2002 | 0 | |
AAA | 1998 | 1 | AAA | 2003 | 0 | |
AAA | 1999 | 1 | AAA | 2004 | 0 | |
AAA | 2000 | 0 | AAA | 2005 | 0 | |
AAA | 2001 | 1 | AAA | 2006 | 1 | |
AAA | 2002 | 0 | AAA | 2007 | 1 | |
AAA | 2003 | 0 | AAA | 2008 | 1 | |
AAA | 2004 | 0 | AAA | 2009 | 1 | |
AAA | 2005 | 0 | AAA | 2010 | 1 | |
AAA | 2006 | 1 | ||||
AAA | 2007 | 1 | ||||
AAA | 2008 | 1 | ||||
AAA | 2009 | 1 | ||||
AAA | 2010 | 1 | ||||
AAA | 2011 | 0 |
Is this the input data or the output data? If it's the input data, please show us the desired output. Or vice versa.
Hello,
Thanks for responding. The left hand side shows the input data and right hand side shows output data. Please let me know if it is not clear.
data have;
input Gvkey $ year Dummy ;* Gvkey year Dummy ;
cards;
AAA 1988 0 AAA 1991 0
AAA 1989 1 AAA 1992 0
AAA 1990 1 AAA 1993 0
AAA 1991 0 AAA 1994 0
AAA 1992 0 AAA 1995 1
AAA 1993 0 AAA 1996 1
AAA 1994 0 AAA 1997 1
AAA 1995 1 AAA 1998 1
AAA 1996 1 AAA 1999 1
AAA 1997 1 AAA 2002 0
AAA 1998 1 AAA 2003 0
AAA 1999 1 AAA 2004 0
AAA 2000 0 AAA 2005 0
AAA 2001 1 AAA 2006 1
AAA 2002 0 AAA 2007 1
AAA 2003 0 AAA 2008 1
AAA 2004 0 AAA 2009 1
AAA 2005 0 AAA 2010 1
AAA 2006 1
AAA 2007 1
AAA 2008 1
AAA 2009 1
AAA 2010 1
AAA 2011 0
;
proc sort data=have out=temp1;
by gvkey dummy year;
run;
data temp2;
set temp1;
by gvkey dummy ;
if dif(year) ne 1 then grp+1;
if first.dummy then grp=1;
run;
proc sql;
create table want(drop=grp) as
select *
from temp2
group by gvkey,dummy ,grp
having count(grp)>=4
order by gvkey,year, dummy;
quit;
data have;
input Gvkey $ year Dummy ;* Gvkey year Dummy ;
cards;
AAA 1988 0 AAA 1991 0
AAA 1989 1 AAA 1992 0
AAA 1990 1 AAA 1993 0
AAA 1991 0 AAA 1994 0
AAA 1992 0 AAA 1995 1
AAA 1993 0 AAA 1996 1
AAA 1994 0 AAA 1997 1
AAA 1995 1 AAA 1998 1
AAA 1996 1 AAA 1999 1
AAA 1997 1 AAA 2002 0
AAA 1998 1 AAA 2003 0
AAA 1999 1 AAA 2004 0
AAA 2000 0 AAA 2005 0
AAA 2001 1 AAA 2006 1
AAA 2002 0 AAA 2007 1
AAA 2003 0 AAA 2008 1
AAA 2004 0 AAA 2009 1
AAA 2005 0 AAA 2010 1
AAA 2006 1
AAA 2007 1
AAA 2008 1
AAA 2009 1
AAA 2010 1
AAA 2011 0
;
proc sort data=have out=temp1;
by gvkey dummy year;
run;
data want;
if _n_=1 then do;
dcl hash H (ordered: "A") ;
h.definekey ("gvkey","year") ;
h.definedata ("gvkey","year", "Dummy") ;
h.definedone () ;
dcl hiter hi('h');
end;
do until(last.gvkey);
set temp1;
by gvkey dummy ;
array t(9999) _temporary_;
_year=year;
if dif(_year) ne 1 then do;
grp+1;
count=1;
end;
else count+1;
if count=1 and n(of t(*))>=4 then do;
do year=min(of t(*)) to max(of t(*));
rc=h.add();
end;
call missing(of t(*));
end;
t(count)=_year;
if first.dummy then grp=1;
end;
if count>=4 then do year=min(of t(*)) to max(of t(*));
rc=h.add();
end;
do while(hi.next()=0);
output;
end;
h.clear();
call missing(of t(*),count,grp);
keep gvkey year Dummy;
run;
I have an additonal questions regarding the previous post
data want;
set have;
by key;
if lagdummy - dummy =1 then do
(dummy = 0 and count > 2)or (dummy = 1 and count > 3) ;end;
output;
run;
SAS keeps giving me error messages. Please help! Thanks!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.