Help using Base SAS procedures

consecutive observations only

Accepted Solution Solved
Reply
Contributor
Posts: 44
Accepted Solution

consecutive observations only

Hi all,

I need help withe following procedure...

Suppose I have the following panel data:

PanelistYearOther Variables
12003
12004
12005
12006
12008
12009
22003
22004
22005
22007
22008
22009
32003
32004

I want to remove the years that are not consecutive for at least three years such that:

PanelistYearOther Variables
12003
12004
12005
12006
22003
22004
22005
22007
22008
22009

Note that panelist 3 was entirely removed since it only has 2 (<3) consecutive years.

What is an efficient way to do this? I'm a SAS novice and any comments would be greatly appreciated.

Thanks,

Chris


Accepted Solutions
Solution
‎01-14-2012 05:16 PM
PROC Star
Posts: 7,357

Re: consecutive observations only

There is probably an easier way to do what you want, but the following does accomplish the task.  It is based on the look ahead look back method that is described at: http://www.sascommunity.org/wiki/Look-Ahead_and_Look-Back

data have;

  input Panelist Year othervars;

  cards;

1          2003 1

1          2004 1

1          2005 1

1          2006 1

1          2008 1

1          2009 1

2          2003 1

2          2004 1

2          2005 1

2          2007 1

2          2008 1

2          2009 1

3          2003 1

3          2004 1

;

proc sort data=have;

  by panelist year;

run;

data want (drop=_Smiley Happy;

  set have;

  by Panelist;

  set have ( firstobs = 2 keep = year rename = (year = _year2) )

      have (      obs = 1 drop = _all_                        );

  _year2 = ifn(  last.Panelist, (.), _year2 );

  set have ( firstobs = 3 keep = year rename = (year = _year3) )

      have (      obs = 2 drop = _all_                        );

  _year3 = ifn(  last.Panelist, (.), _year3 );

  _prev1 = ifn( first.Panelist, (.), lag(year) );

  _prev2 = ifn( first.Panelist, (.), lag2(year) );

  if (year-2 eq _prev1-1 eq _prev2) or

     (year+2 eq _year2+1 eq _year3) or

     (year eq _year2-1 eq _prev1+1) then output;

run;

View solution in original post


All Replies
Solution
‎01-14-2012 05:16 PM
PROC Star
Posts: 7,357

Re: consecutive observations only

There is probably an easier way to do what you want, but the following does accomplish the task.  It is based on the look ahead look back method that is described at: http://www.sascommunity.org/wiki/Look-Ahead_and_Look-Back

data have;

  input Panelist Year othervars;

  cards;

1          2003 1

1          2004 1

1          2005 1

1          2006 1

1          2008 1

1          2009 1

2          2003 1

2          2004 1

2          2005 1

2          2007 1

2          2008 1

2          2009 1

3          2003 1

3          2004 1

;

proc sort data=have;

  by panelist year;

run;

data want (drop=_Smiley Happy;

  set have;

  by Panelist;

  set have ( firstobs = 2 keep = year rename = (year = _year2) )

      have (      obs = 1 drop = _all_                        );

  _year2 = ifn(  last.Panelist, (.), _year2 );

  set have ( firstobs = 3 keep = year rename = (year = _year3) )

      have (      obs = 2 drop = _all_                        );

  _year3 = ifn(  last.Panelist, (.), _year3 );

  _prev1 = ifn( first.Panelist, (.), lag(year) );

  _prev2 = ifn( first.Panelist, (.), lag2(year) );

  if (year-2 eq _prev1-1 eq _prev2) or

     (year+2 eq _year2+1 eq _year3) or

     (year eq _year2-1 eq _prev1+1) then output;

run;

Contributor
Posts: 44

consecutive observations only

Thank you! It worked! Smiley Happy

PROC Star
Posts: 7,357

consecutive observations only

I will still be interested to see if anyone provides an easier solution, but I have found the look ahead look back method to be extremely handy and versitile for addressing such needs.

Respected Advisor
Posts: 3,887

Re: consecutive observations only

I can't really think of an easier solution than some kind of a "look-ahead". One could of course transpose stuff and then use array processing.

The one thing I would try to change is the need for a pass through the data and additional logic for every single consecutive year (eg: what if the OP would have asked for 6 consecutive years?).

Below a code variation which should work for any number of consecutive years:

proc delete data=work.want;run;

data have;
  input Panelist Year othervars;
cards;
1 2003 1
1 2004 1
1 2005 1
1 2006 1
1 2008 1
1 2009 1
2 2003 1
2 2004 1
2 2005 1
2 2007 1
2 2008 1
2 2009 1
3 2003 1
3 2004 1
4 2004 1
;
run;

proc sql;
  create table DistObs as
  select distinct panelist,year
  from have
  order by panelist,year
  ;
quit;

%let MinConsecutiveYears=3;
data YearSelection (keep=Panelist Year);

  do while(1=1);

    _pointerStart=sum(_pointerStart,1);
    _pointerEnd  =sum(_pointerStart,&MinConsecutiveYears,-1);

    if _pointerEnd>_nobs then stop;
 
    set DistObs (keep=Panelist Year rename=(Panelist=_PanelistStart Year=_YearStart)) point=_pointerStart nobs=_nobs;
    set DistObs (keep=Panelist Year rename=(Panelist=_PanelistStop Year=_YearStop)) point=_pointerEnd;

    if _PanelistStart=_PanelistStop and sum(_YearStart,&MinConsecutiveYears,-1)=_YearStop then
    do;
      _LagPanelist=_PanelistStart;
      _LagYear=_YearStart-1;
      do _pointerStart=_pointerStart to _nobs;
        set DistObs point=_pointerStart;
        if _LagPanelist ne Panelist or _LagYear ne Year-1 then leave;
        _LagPanelist=Panelist;
        _LagYear    =Year;
        output;
      end;
      _pointerStart=sum(_pointerStart,-1);
    end;

  end;

run;

proc sql;
/*  create table want as*/
  select l.*
    from have l, YearSelection r
    where l.panelist=r.panelist and l.year=r.year
    order by l.panelist, l.year
  ;
quit;

Super User
Super User
Posts: 6,498

Re: consecutive observations only

Art -

   I think that it is not working at the end of the file.  Sort by descending Panelist.

  Here is a similar method with more checking for end of file condition.  But it does require that SAS is able to count incoming observations such when using a dataset as input instead of a view.

data have;

  input Panelist Year Othervars @@ ;

  cards;

1 2003 1 1 2004 1 1 2005 1 1 2006 1 1 2008 1 1 2009 1

2 2003 1 2 2004 1 2 2005 1 2 2007 1 2 2008 1 2 2009 1

3 2003 1 3 2004 1

;

data want ;

  set have nobs=nobs ;

  if (_n_ < nobs) then set

    have (firstobs=2 keep=panelist year rename=(panelist=_np year=_ny))

  ;

  if (_n_ < nobs - 1) then set

    have (firstobs=3 keep=panelist year rename=(panelist=_np2 year=_ny2))

  ;

  _pp = lag(panelist);

  _py = lag(year);

  _pp2= lag2(panelist);

  _py2= lag2(year);

  keep =

   (_pp2 = _pp = panelist) and (sum(_py2,2) = sum(_py,1)=  year)

  or (_pp = panelist = _np) and (sum(_py,1)=  year = sum(_ny,-1) )

  or (panelist = _np = _np2) and ( year = sum(_ny,-1) = sum(_ny2,-2) )

  ;

run;

proc print;

by panelist ;

id panelist;

var keep year othervars;

run;

Panelist    keep    Year    Othervars

    1         1     2003        1

              1     2004        1

              1     2005        1

              1     2006        1

              0     2008        1

              0     2009        1

    2         1     2003        1

              1     2004        1

              1     2005        1

              1     2007        1

              1     2008        1

              1     2009        1

    3         0     2003        1

              0     2004        1

PROC Star
Posts: 7,357

Re: consecutive observations only

Tom,

The method I proposed did work, I think.  Your code doesn't account for the specified criteria, namely the a record should ONLY be output if it is part of an adjacent three year sequence.

Art

Super User
Super User
Posts: 6,498

Re: consecutive observations only

If I run it just on panelist 2 I only get 5 records. The last is lost.

I did not see the extra version of have getting pulled in with no variables.

On the second one you need change OBS=1 to OBS=2.

PROC Star
Posts: 7,357

Re: consecutive observations only

Tom,

You were absolutely correct!  I changed the code (i.e., making obs=2) in my original post and, hopefully, the OP has followed our latest exchange.

Art

Super Contributor
Posts: 1,636

Re: consecutive observations only

Hi Art,

I think "obs=1" shoud be "obs=2" in your code.  

sethave ( firstobs = 3 keep = year rename =(year = _year3) )

      have (      obs = 1 drop= _all_                      );

you will see the difference if you add another obsevation

"Panelist=3 Year=2005" to the dataset.

Thank you!


PROC Star
Posts: 7,357

Re: consecutive observations only

Linlin, I think (hope) I made the correct change at the EXACT moment that you posted your response.

Contributor
Posts: 44

Re: consecutive observations only

thank you all. i got it revised. Smiley Happy

PROC Star
Posts: 7,357

Re: consecutive observations only

This would make an excellent SGF presentation .. too bad it is too late to submit for 2012.  Anyone planning on going to San Francisco in 2013?

Regular Contributor
Posts: 184

Re: consecutive observations only

Use a Double DoW after pre-processing to create an appropriate BY variable.

data grouped ;

set have ;

by panelist ;

groupnum + ( first.panelist OR ( dif(year) GT 1 ) ) ;

run ;

data want ;

do many = 1 by 1 until (last.groupnum) ;

   set grouped ;

   by groupnum ;

   end ;

do nn = 1 to many ;

   set grouped ;

   if many GE 3 then output ;

   end ;

run ;

PROC Star
Posts: 7,357

Re: consecutive observations only

Howard,

Nice solution!  I especially liked the preprocessing part.  Of course, following that, a fairly simple proc sql segment also would have sufficed. I.e.:

data grouped ;

  set have ;

  by panelist ;

  groupnum + ( first.panelist OR ( dif(year) GT 1 ) ) ;

run ;

proc sql;

  create table want (drop=groupnum) as

    select *

      from grouped

        group by groupnum

          having count(*) > 2

            order by panelist,year

  ;

quit;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 21 replies
  • 3816 views
  • 0 likes
  • 9 in conversation