Solved
Contributor
Posts: 44

# consecutive observations only

Hi all,

I need help withe following procedure...

Suppose I have the following panel data:

 Panelist Year Other Variables 1 2003 1 2004 1 2005 1 2006 1 2008 1 2009 2 2003 2 2004 2 2005 2 2007 2 2008 2 2009 3 2003 3 2004

I want to remove the years that are not consecutive for at least three years such that:

 Panelist Year Other Variables 1 2003 1 2004 1 2005 1 2006 2 2003 2 2004 2 2005 2 2007 2 2008 2 2009

Note that panelist 3 was entirely removed since it only has 2 (<3) consecutive years.

What is an efficient way to do this? I'm a SAS novice and any comments would be greatly appreciated.

Thanks,

Chris

Accepted Solutions
Solution
‎01-14-2012 05:16 PM
PROC Star
Posts: 8,169

## Re: consecutive observations only

There is probably an easier way to do what you want, but the following does accomplish the task.  It is based on the look ahead look back method that is described at: http://www.sascommunity.org/wiki/Look-Ahead_and_Look-Back

data have;

input Panelist Year othervars;

cards;

1          2003 1

1          2004 1

1          2005 1

1          2006 1

1          2008 1

1          2009 1

2          2003 1

2          2004 1

2          2005 1

2          2007 1

2          2008 1

2          2009 1

3          2003 1

3          2004 1

;

proc sort data=have;

by panelist year;

run;

data want (drop=_;

set have;

by Panelist;

set have ( firstobs = 2 keep = year rename = (year = _year2) )

have (      obs = 1 drop = _all_                        );

_year2 = ifn(  last.Panelist, (.), _year2 );

set have ( firstobs = 3 keep = year rename = (year = _year3) )

have (      obs = 2 drop = _all_                        );

_year3 = ifn(  last.Panelist, (.), _year3 );

_prev1 = ifn( first.Panelist, (.), lag(year) );

_prev2 = ifn( first.Panelist, (.), lag2(year) );

if (year-2 eq _prev1-1 eq _prev2) or

(year+2 eq _year2+1 eq _year3) or

(year eq _year2-1 eq _prev1+1) then output;

run;

All Replies
Solution
‎01-14-2012 05:16 PM
PROC Star
Posts: 8,169

## Re: consecutive observations only

There is probably an easier way to do what you want, but the following does accomplish the task.  It is based on the look ahead look back method that is described at: http://www.sascommunity.org/wiki/Look-Ahead_and_Look-Back

data have;

input Panelist Year othervars;

cards;

1          2003 1

1          2004 1

1          2005 1

1          2006 1

1          2008 1

1          2009 1

2          2003 1

2          2004 1

2          2005 1

2          2007 1

2          2008 1

2          2009 1

3          2003 1

3          2004 1

;

proc sort data=have;

by panelist year;

run;

data want (drop=_;

set have;

by Panelist;

set have ( firstobs = 2 keep = year rename = (year = _year2) )

have (      obs = 1 drop = _all_                        );

_year2 = ifn(  last.Panelist, (.), _year2 );

set have ( firstobs = 3 keep = year rename = (year = _year3) )

have (      obs = 2 drop = _all_                        );

_year3 = ifn(  last.Panelist, (.), _year3 );

_prev1 = ifn( first.Panelist, (.), lag(year) );

_prev2 = ifn( first.Panelist, (.), lag2(year) );

if (year-2 eq _prev1-1 eq _prev2) or

(year+2 eq _year2+1 eq _year3) or

(year eq _year2-1 eq _prev1+1) then output;

run;

Contributor
Posts: 44

## consecutive observations only

Thank you! It worked!

PROC Star
Posts: 8,169

## consecutive observations only

I will still be interested to see if anyone provides an easier solution, but I have found the look ahead look back method to be extremely handy and versitile for addressing such needs.

Posts: 4,743

## Re: consecutive observations only

I can't really think of an easier solution than some kind of a "look-ahead". One could of course transpose stuff and then use array processing.

The one thing I would try to change is the need for a pass through the data and additional logic for every single consecutive year (eg: what if the OP would have asked for 6 consecutive years?).

Below a code variation which should work for any number of consecutive years:

proc delete data=work.want;run;

data have;
input Panelist Year othervars;
cards;
1 2003 1
1 2004 1
1 2005 1
1 2006 1
1 2008 1
1 2009 1
2 2003 1
2 2004 1
2 2005 1
2 2007 1
2 2008 1
2 2009 1
3 2003 1
3 2004 1
4 2004 1
;
run;

proc sql;
create table DistObs as
select distinct panelist,year
from have
order by panelist,year
;
quit;

%let MinConsecutiveYears=3;
data YearSelection (keep=Panelist Year);

do while(1=1);

_pointerStart=sum(_pointerStart,1);
_pointerEnd  =sum(_pointerStart,&MinConsecutiveYears,-1);

if _pointerEnd>_nobs then stop;

set DistObs (keep=Panelist Year rename=(Panelist=_PanelistStart Year=_YearStart)) point=_pointerStart nobs=_nobs;
set DistObs (keep=Panelist Year rename=(Panelist=_PanelistStop Year=_YearStop)) point=_pointerEnd;

if _PanelistStart=_PanelistStop and sum(_YearStart,&MinConsecutiveYears,-1)=_YearStop then
do;
_LagPanelist=_PanelistStart;
_LagYear=_YearStart-1;
do _pointerStart=_pointerStart to _nobs;
set DistObs point=_pointerStart;
if _LagPanelist ne Panelist or _LagYear ne Year-1 then leave;
_LagPanelist=Panelist;
_LagYear    =Year;
output;
end;
_pointerStart=sum(_pointerStart,-1);
end;

end;

run;

proc sql;
/*  create table want as*/
select l.*
from have l, YearSelection r
where l.panelist=r.panelist and l.year=r.year
order by l.panelist, l.year
;
quit;

Super User
Posts: 8,127

## Re: consecutive observations only

Art -

I think that it is not working at the end of the file.  Sort by descending Panelist.

Here is a similar method with more checking for end of file condition.  But it does require that SAS is able to count incoming observations such when using a dataset as input instead of a view.

`data have;  input Panelist Year Othervars @@ ;  cards;1 2003 1 1 2004 1 1 2005 1 1 2006 1 1 2008 1 1 2009 12 2003 1 2 2004 1 2 2005 1 2 2007 1 2 2008 1 2 2009 13 2003 1 3 2004 1;data want ;  set have nobs=nobs ;  if (_n_ < nobs) then set    have (firstobs=2 keep=panelist year rename=(panelist=_np year=_ny))  ;  if (_n_ < nobs - 1) then set    have (firstobs=3 keep=panelist year rename=(panelist=_np2 year=_ny2))  ;  _pp = lag(panelist);  _py = lag(year);  _pp2= lag2(panelist);  _py2= lag2(year);  keep =   (_pp2 = _pp = panelist) and (sum(_py2,2) = sum(_py,1)=  year)  or (_pp = panelist = _np) and (sum(_py,1)=  year = sum(_ny,-1) )  or (panelist = _np = _np2) and ( year = sum(_ny,-1) = sum(_ny2,-2) )  ;run;proc print; by panelist ; id panelist; var keep year othervars;run;`

`Panelist    keep    Year    Othervars    1         1     2003        1              1     2004        1              1     2005        1              1     2006        1              0     2008        1              0     2009        1    2         1     2003        1              1     2004        1              1     2005        1              1     2007        1              1     2008        1              1     2009        1    3         0     2003        1              0     2004        1`
PROC Star
Posts: 8,169

## Re: consecutive observations only

Tom,

The method I proposed did work, I think.  Your code doesn't account for the specified criteria, namely the a record should ONLY be output if it is part of an adjacent three year sequence.

Art

Super User
Posts: 8,127

## Re: consecutive observations only

If I run it just on panelist 2 I only get 5 records. The last is lost.

I did not see the extra version of have getting pulled in with no variables.

On the second one you need change OBS=1 to OBS=2.

PROC Star
Posts: 8,169

## Re: consecutive observations only

Tom,

You were absolutely correct!  I changed the code (i.e., making obs=2) in my original post and, hopefully, the OP has followed our latest exchange.

Art

Super Contributor
Posts: 1,636

## Re: consecutive observations only

Hi Art,

I think "obs=1" shoud be "obs=2" in your code.

sethave ( firstobs = 3 keep = year rename =(year = _year3) )

have (      obs = 1 drop= _all_                      );

you will see the difference if you add another obsevation

"Panelist=3 Year=2005" to the dataset.

Thank you!

PROC Star
Posts: 8,169

## Re: consecutive observations only

Linlin, I think (hope) I made the correct change at the EXACT moment that you posted your response.

Contributor
Posts: 44

## Re: consecutive observations only

thank you all. i got it revised.

PROC Star
Posts: 8,169

## Re: consecutive observations only

This would make an excellent SGF presentation .. too bad it is too late to submit for 2012.  Anyone planning on going to San Francisco in 2013?

Regular Contributor
Posts: 184

## Re: consecutive observations only

Use a Double DoW after pre-processing to create an appropriate BY variable.

data grouped ;

set have ;

by panelist ;

groupnum + ( first.panelist OR ( dif(year) GT 1 ) ) ;

run ;

data want ;

do many = 1 by 1 until (last.groupnum) ;

set grouped ;

by groupnum ;

end ;

do nn = 1 to many ;

set grouped ;

if many GE 3 then output ;

end ;

run ;

PROC Star
Posts: 8,169

## Re: consecutive observations only

Howard,

Nice solution!  I especially liked the preprocessing part.  Of course, following that, a fairly simple proc sql segment also would have sufficed. I.e.:

data grouped ;

set have ;

by panelist ;

groupnum + ( first.panelist OR ( dif(year) GT 1 ) ) ;

run ;

proc sql;

create table want (drop=groupnum) as

select *

from grouped

group by groupnum

having count(*) > 2

order by panelist,year

;

quit;

🔒 This topic is solved and locked.