BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mandyosu01
Obsidian | Level 7

Dear SAS support community,

 

I want to keep observations with AT LEAST three years before with dummy variable =0, and AT LEAST three years after with Dummy variable =1. If there are more than three years of observations with Dummy =0 before certain year and more than three years of observations with Dummy =1 after, it is fine as well. The data sample is as follows:

 

Left: Data Have                                                 Right: Data Want 

Gvkey yearDummy  Gvkey yearDummy 
AAA19910 AAA19910
AAA19920 AAA19920
AAA19930 AAA19930
AAA19940 AAA19940
AAA19951 AAA19951
AAA19961 AAA19961
AAA19971 AAA19971
AAA19981 AAA19981
AAA19991 AAA19991
AAA20000    
AAA20011    
AAA20020    
AAA20030    
AAA20041    

 

I tried lead and lag, and also loop, but  doesn't seem to work. Thank you so much! Any suggestion is appreciated!

 

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

I think SQL will provide an easy answer to your problem. Something like this should work:

 

proc sql;
  create table want as 
  select A.*
        ,B.Year_Min
  from have as A
  inner join
  (select Gvkey
         ,min(year) as Year_Min
   from have
   where Dummy = 1
   group by Gvkey
  ) as B
  on A.Gvkey = B.Gvkey
  where abs(A.year - B.Year_Min) <= 4
  ;
quit;

View solution in original post

8 REPLIES 8
SASKiwi
PROC Star

I think SQL will provide an easy answer to your problem. Something like this should work:

 

proc sql;
  create table want as 
  select A.*
        ,B.Year_Min
  from have as A
  inner join
  (select Gvkey
         ,min(year) as Year_Min
   from have
   where Dummy = 1
   group by Gvkey
  ) as B
  on A.Gvkey = B.Gvkey
  where abs(A.year - B.Year_Min) <= 4
  ;
quit;
mandyosu01
Obsidian | Level 7

Thank you! One more question: Is there a way to full fill both conditions? I saw your solution only meets the "at least four years of dummy = 1 observations". 

 

I was trying something like this:


proc sql;
create table want1 as select A.*
from have as A inner join
(select gvkey,min(year) as Year_Min from have where dummy= 0
group by gvkey ) as B
on A.gvkey= B.gvkey
where abs(A.year - B.Year_Min) <= 3;
quit;

 

Then I can maybe inner join want and want1 tables? 

mandyosu01
Obsidian | Level 7

More specifically the data should be something like this:

 

data have:

keyyearDummy  keyyearDummy 
AAA19880 AAA19910
AAA19891 AAA19920
AAA19901 AAA19930
AAA19910 AAA19940
AAA19920 AAA19951
AAA19930 AAA19961
AAA19940 AAA19971
AAA19951 AAA19981
AAA19961 AAA19991
AAA19971    
AAA19981    
AAA19991    
AAA20000    
AAA20011    
AAA20020    
AAA20030    
AAA20041    
SASKiwi
PROC Star

How about something like this?

 

proc sql;
  create table want as 
  select A.*
        ,B.Year_Min
       ,C.Year_Max
  from have as A

  left join
  (select Gvkey
         ,min(year) as Year_Min
   from have
   where Dummy = 1
   group by Gvkey
  ) as B
  on A.Gvkey = B.Gvkey

  left join
  (select Gvkey
         ,max(year) as Year_Max
   from have
   where Dummy = 0
   group by Gvkey
  ) as C
  on A.Gvkey = C.Gvkey

  where abs(A.year - B.Year_Min) <= 4
  and abs(A.year - C.Year_Max) <= 4
  ;
quit;
novinosrin
Tourmaline | Level 20

data have;
input Gvkey $ 	year	Dummy 	;* 	Gvkey 	year	Dummy  ;
cards;
AAA	1991	0	 	AAA	1991	0
AAA	1992	0	 	AAA	1992	0
AAA	1993	0	 	AAA	1993	0
AAA	1994	0	 	AAA	1994	0
AAA	1995	1	 	AAA	1995	1
AAA	1996	1	 	AAA	1996	1
AAA	1997	1	 	AAA	1997	1
AAA	1998	1	 	AAA	1998	1
AAA	1999	1	 	AAA	1999	1
AAA	2000	0	 	 	 	 
AAA	2001	1	 	 	 	 
AAA	2002	0	 	 	 	 
AAA	2003	0	 	 	 	 
AAA	2004	1	 	 	 	 
;

data temp;
if _n_=1 then do;
if 0 then set have(rename=(year=_year dummy=_d));
   dcl hash H (dataset:'have(rename=(year=_year dummy=_d))',ordered: "A") ;
   h.definekey  ("gvkey","_year") ;
   h.definedata ("_year","_d") ;
   h.definedone () ;
   dcl hiter hi('h');
end;
n=0;
do until(last.year);
 set have;
 by gvkey year;
 do rc=hi.setcur(key: gvkey,key:year) by 0 while(rc=0);
   n+1;
 if dummy=0 then do;
	rc=hi.prev();
	if dummy ne _d then leave;
	end;
 else do;
    rc=hi.next();
    if dummy ne _d then leave;
	end;
  end;
end;
do until(last.year);
 set have;
 by gvkey year;
 if n>=4 then  do;
 do rc=hi.setcur(key: gvkey,key:year) by 0 while(rc=0);
      output;
   if dummy=0 then do;
  	rc=hi.prev();
	if dummy ne _d then leave;
    end;
    else do;
    rc=hi.next();
    if dummy ne _d then leave;
    end;
  end;
  end;
end;
keep gvkey _year dummy;
run;

proc sort data=temp out =want(rename=_year=year) nodupkey;
by gvkey _year dummy;
run;
novinosrin
Tourmaline | Level 20
data have;
input Gvkey $ 	year	Dummy 	;* 	Gvkey 	year	Dummy  ;
cards;
AAA	1991	0	 	AAA	1991	0
AAA	1992	0	 	AAA	1992	0
AAA	1993	0	 	AAA	1993	0
AAA	1994	0	 	AAA	1994	0
AAA	1995	1	 	AAA	1995	1
AAA	1996	1	 	AAA	1996	1
AAA	1997	1	 	AAA	1997	1
AAA	1998	1	 	AAA	1998	1
AAA	1999	1	 	AAA	1999	1
AAA	2000	0	 	 	 	 
AAA	2001	1	 	 	 	 
AAA	2002	0	 	 	 	 
AAA	2003	0	 	 	 	 
AAA	2004	1	 	 	 	 
;

proc sort data=have out=temp1;
by gvkey dummy year;
run;

data temp2;
set temp1;
by gvkey dummy ;
if dif(year) ne 1 the grp+1;
if first.dummy then grp=1;
run;

proc sql;
create table want(drop=grp) as
select *
from temp2
group by gvkey,dummy ,grp
having 	count(grp)>=4
order by gvkey,dummy, year;
quit;
novinosrin
Tourmaline | Level 20

data have;
input Gvkey $ 	year	Dummy 	;* 	Gvkey 	year	Dummy  ;
cards;
AAA	1988	0	 	AAA	1991	0
AAA	1989	1	 	AAA	1992	0
AAA	1990	1	 	AAA	1993	0
AAA	1991	0	 	AAA	1994	0
AAA	1992	0	 	AAA	1995	1
AAA	1993	0	 	AAA	1996	1
AAA	1994	0	 	AAA	1997	1
AAA	1995	1	 	AAA	1998	1
AAA	1996	1	 	AAA	1999	1
AAA	1997	1	 	AAA	2002	0
AAA	1998	1	 	AAA	2003	0
AAA	1999	1	 	AAA	2004	0
AAA	2000	0	 	AAA	2005	0
AAA	2001	1	 	AAA	2006	1
AAA	2002	0	 	AAA	2007	1
AAA	2003	0	 	AAA	2008	1
AAA	2004	0	 	AAA	2009	1
AAA	2005	0	 	AAA	2010	1
AAA	2006	1	 	 	 	 
AAA	2007	1	 	 	 	 
AAA	2008	1	 	 	 	 
AAA	2009	1	 	 	 	 
AAA	2010	1	 	 	 	 
AAA	2011	0
;

proc sort data=have out=temp1;
by gvkey dummy year;
run;
data want;
if _n_=1 then do;
 dcl hash H (ordered: "A") ;
   h.definekey  ("gvkey","year") ;
   h.definedata ("gvkey","year", "Dummy") ;
   h.definedone () ;
   dcl hiter hi('h');
end;
do until(last.gvkey);
 set temp1;
 by gvkey dummy ;
 array t(9999)  _temporary_;
 _year=year;
 if dif(_year) ne 1 then do; 
 grp+1; 
 count=1;
 end;
 else count+1;
 if count=1  and n(of t(*))>=4  then do;
 do year=min(of t(*)) to max(of t(*));
  rc=h.add();
 end;
 call missing(of t(*));
 end;
 t(count)=_year;
 if first.dummy then grp=1;
end;
 if 	count>=4 then  do year=min(of t(*)) to max(of t(*));
 rc=h.add();
end;
do while(hi.next()=0);
output;
end;
h.clear();
call missing(of t(*),count,grp);
keep gvkey year Dummy;
run;
mandyosu01
Obsidian | Level 7

All of your suggestions are very helpful and I appreciate it!

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 615 views
  • 2 likes
  • 3 in conversation