Hi all,
I need help withe following procedure...
Suppose I have the following panel data:
Panelist | Year | Other Variables |
1 | 2003 | |
1 | 2004 | |
1 | 2005 | |
1 | 2006 | |
1 | 2008 | |
1 | 2009 | |
2 | 2003 | |
2 | 2004 | |
2 | 2005 | |
2 | 2007 | |
2 | 2008 | |
2 | 2009 | |
3 | 2003 | |
3 | 2004 |
I want to remove the years that are not consecutive for at least three years such that:
Panelist | Year | Other Variables |
1 | 2003 | |
1 | 2004 | |
1 | 2005 | |
1 | 2006 | |
2 | 2003 | |
2 | 2004 | |
2 | 2005 | |
2 | 2007 | |
2 | 2008 | |
2 | 2009 |
Note that panelist 3 was entirely removed since it only has 2 (<3) consecutive years.
What is an efficient way to do this? I'm a SAS novice and any comments would be greatly appreciated.
Thanks,
Chris
There is probably an easier way to do what you want, but the following does accomplish the task. It is based on the look ahead look back method that is described at: http://www.sascommunity.org/wiki/Look-Ahead_and_Look-Back
data have;
input Panelist Year othervars;
cards;
1 2003 1
1 2004 1
1 2005 1
1 2006 1
1 2008 1
1 2009 1
2 2003 1
2 2004 1
2 2005 1
2 2007 1
2 2008 1
2 2009 1
3 2003 1
3 2004 1
;
proc sort data=have;
by panelist year;
run;
data want (drop=_:);
set have;
by Panelist;
set have ( firstobs = 2 keep = year rename = (year = _year2) )
have ( obs = 1 drop = _all_ );
_year2 = ifn( last.Panelist, (.), _year2 );
set have ( firstobs = 3 keep = year rename = (year = _year3) )
have ( obs = 2 drop = _all_ );
_year3 = ifn( last.Panelist, (.), _year3 );
_prev1 = ifn( first.Panelist, (.), lag(year) );
_prev2 = ifn( first.Panelist, (.), lag2(year) );
if (year-2 eq _prev1-1 eq _prev2) or
(year+2 eq _year2+1 eq _year3) or
(year eq _year2-1 eq _prev1+1) then output;
run;
There is probably an easier way to do what you want, but the following does accomplish the task. It is based on the look ahead look back method that is described at: http://www.sascommunity.org/wiki/Look-Ahead_and_Look-Back
data have;
input Panelist Year othervars;
cards;
1 2003 1
1 2004 1
1 2005 1
1 2006 1
1 2008 1
1 2009 1
2 2003 1
2 2004 1
2 2005 1
2 2007 1
2 2008 1
2 2009 1
3 2003 1
3 2004 1
;
proc sort data=have;
by panelist year;
run;
data want (drop=_:);
set have;
by Panelist;
set have ( firstobs = 2 keep = year rename = (year = _year2) )
have ( obs = 1 drop = _all_ );
_year2 = ifn( last.Panelist, (.), _year2 );
set have ( firstobs = 3 keep = year rename = (year = _year3) )
have ( obs = 2 drop = _all_ );
_year3 = ifn( last.Panelist, (.), _year3 );
_prev1 = ifn( first.Panelist, (.), lag(year) );
_prev2 = ifn( first.Panelist, (.), lag2(year) );
if (year-2 eq _prev1-1 eq _prev2) or
(year+2 eq _year2+1 eq _year3) or
(year eq _year2-1 eq _prev1+1) then output;
run;
Thank you! It worked!
I will still be interested to see if anyone provides an easier solution, but I have found the look ahead look back method to be extremely handy and versitile for addressing such needs.
I can't really think of an easier solution than some kind of a "look-ahead". One could of course transpose stuff and then use array processing.
The one thing I would try to change is the need for a pass through the data and additional logic for every single consecutive year (eg: what if the OP would have asked for 6 consecutive years?).
Below a code variation which should work for any number of consecutive years:
proc delete data=work.want;run;
data have;
input Panelist Year othervars;
cards;
1 2003 1
1 2004 1
1 2005 1
1 2006 1
1 2008 1
1 2009 1
2 2003 1
2 2004 1
2 2005 1
2 2007 1
2 2008 1
2 2009 1
3 2003 1
3 2004 1
4 2004 1
;
run;
proc sql;
create table DistObs as
select distinct panelist,year
from have
order by panelist,year
;
quit;
%let MinConsecutiveYears=3;
data YearSelection (keep=Panelist Year);
do while(1=1);
_pointerStart=sum(_pointerStart,1);
_pointerEnd =sum(_pointerStart,&MinConsecutiveYears,-1);
if _pointerEnd>_nobs then stop;
set DistObs (keep=Panelist Year rename=(Panelist=_PanelistStart Year=_YearStart)) point=_pointerStart nobs=_nobs;
set DistObs (keep=Panelist Year rename=(Panelist=_PanelistStop Year=_YearStop)) point=_pointerEnd;
if _PanelistStart=_PanelistStop and sum(_YearStart,&MinConsecutiveYears,-1)=_YearStop then
do;
_LagPanelist=_PanelistStart;
_LagYear=_YearStart-1;
do _pointerStart=_pointerStart to _nobs;
set DistObs point=_pointerStart;
if _LagPanelist ne Panelist or _LagYear ne Year-1 then leave;
_LagPanelist=Panelist;
_LagYear =Year;
output;
end;
_pointerStart=sum(_pointerStart,-1);
end;
end;
run;
proc sql;
/* create table want as*/
select l.*
from have l, YearSelection r
where l.panelist=r.panelist and l.year=r.year
order by l.panelist, l.year
;
quit;
Art -
I think that it is not working at the end of the file. Sort by descending Panelist.
Here is a similar method with more checking for end of file condition. But it does require that SAS is able to count incoming observations such when using a dataset as input instead of a view.
data have;
input Panelist Year Othervars @@ ;
cards;
1 2003 1 1 2004 1 1 2005 1 1 2006 1 1 2008 1 1 2009 1
2 2003 1 2 2004 1 2 2005 1 2 2007 1 2 2008 1 2 2009 1
3 2003 1 3 2004 1
;
data want ;
set have nobs=nobs ;
if (_n_ < nobs) then set
have (firstobs=2 keep=panelist year rename=(panelist=_np year=_ny))
;
if (_n_ < nobs - 1) then set
have (firstobs=3 keep=panelist year rename=(panelist=_np2 year=_ny2))
;
_pp = lag(panelist);
_py = lag(year);
_pp2= lag2(panelist);
_py2= lag2(year);
keep =
(_pp2 = _pp = panelist) and (sum(_py2,2) = sum(_py,1)= year)
or (_pp = panelist = _np) and (sum(_py,1)= year = sum(_ny,-1) )
or (panelist = _np = _np2) and ( year = sum(_ny,-1) = sum(_ny2,-2) )
;
run;
proc print;
by panelist ;
id panelist;
var keep year othervars;
run;
Panelist keep Year Othervars
1 1 2003 1
1 2004 1
1 2005 1
1 2006 1
0 2008 1
0 2009 1
2 1 2003 1
1 2004 1
1 2005 1
1 2007 1
1 2008 1
1 2009 1
3 0 2003 1
0 2004 1
Tom,
The method I proposed did work, I think. Your code doesn't account for the specified criteria, namely the a record should ONLY be output if it is part of an adjacent three year sequence.
Art
If I run it just on panelist 2 I only get 5 records. The last is lost.
I did not see the extra version of have getting pulled in with no variables.
On the second one you need change OBS=1 to OBS=2.
Tom,
You were absolutely correct! I changed the code (i.e., making obs=2) in my original post and, hopefully, the OP has followed our latest exchange.
Art
Hi Art,
I think "obs=1" shoud be "obs=2" in your code.
sethave ( firstobs = 3 keep = year rename =(year = _year3) )
have ( obs = 1 drop= _all_ );
you will see the difference if you add another obsevation
"Panelist=3 Year=2005" to the dataset.
Thank you!
Linlin, I think (hope) I made the correct change at the EXACT moment that you posted your response.
thank you all. i got it revised.
This would make an excellent SGF presentation .. too bad it is too late to submit for 2012. Anyone planning on going to San Francisco in 2013?
Use a Double DoW after pre-processing to create an appropriate BY variable.
data grouped ;
set have ;
by panelist ;
groupnum + ( first.panelist OR ( dif(year) GT 1 ) ) ;
run ;
data want ;
do many = 1 by 1 until (last.groupnum) ;
set grouped ;
by groupnum ;
end ;
do nn = 1 to many ;
set grouped ;
if many GE 3 then output ;
end ;
run ;
Howard,
Nice solution! I especially liked the preprocessing part. Of course, following that, a fairly simple proc sql segment also would have sufficed. I.e.:
data grouped ;
set have ;
by panelist ;
groupnum + ( first.panelist OR ( dif(year) GT 1 ) ) ;
run ;
proc sql;
create table want (drop=groupnum) as
select *
from grouped
group by groupnum
having count(*) > 2
order by panelist,year
;
quit;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.