BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jkf91
Calcite | Level 5

Hi all,

I need help withe following procedure...

Suppose I have the following panel data:

PanelistYearOther Variables
12003
12004
12005
12006
12008
12009
22003
22004
22005
22007
22008
22009
32003
32004

I want to remove the years that are not consecutive for at least three years such that:

PanelistYearOther Variables
12003
12004
12005
12006
22003
22004
22005
22007
22008
22009

Note that panelist 3 was entirely removed since it only has 2 (<3) consecutive years.

What is an efficient way to do this? I'm a SAS novice and any comments would be greatly appreciated.

Thanks,

Chris

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

There is probably an easier way to do what you want, but the following does accomplish the task.  It is based on the look ahead look back method that is described at: http://www.sascommunity.org/wiki/Look-Ahead_and_Look-Back

data have;

  input Panelist Year othervars;

  cards;

1          2003 1

1          2004 1

1          2005 1

1          2006 1

1          2008 1

1          2009 1

2          2003 1

2          2004 1

2          2005 1

2          2007 1

2          2008 1

2          2009 1

3          2003 1

3          2004 1

;

proc sort data=have;

  by panelist year;

run;

data want (drop=_:);

  set have;

  by Panelist;

  set have ( firstobs = 2 keep = year rename = (year = _year2) )

      have (      obs = 1 drop = _all_                        );

  _year2 = ifn(  last.Panelist, (.), _year2 );

  set have ( firstobs = 3 keep = year rename = (year = _year3) )

      have (      obs = 2 drop = _all_                        );

  _year3 = ifn(  last.Panelist, (.), _year3 );

  _prev1 = ifn( first.Panelist, (.), lag(year) );

  _prev2 = ifn( first.Panelist, (.), lag2(year) );

  if (year-2 eq _prev1-1 eq _prev2) or

     (year+2 eq _year2+1 eq _year3) or

     (year eq _year2-1 eq _prev1+1) then output;

run;

View solution in original post

21 REPLIES 21
art297
Opal | Level 21

There is probably an easier way to do what you want, but the following does accomplish the task.  It is based on the look ahead look back method that is described at: http://www.sascommunity.org/wiki/Look-Ahead_and_Look-Back

data have;

  input Panelist Year othervars;

  cards;

1          2003 1

1          2004 1

1          2005 1

1          2006 1

1          2008 1

1          2009 1

2          2003 1

2          2004 1

2          2005 1

2          2007 1

2          2008 1

2          2009 1

3          2003 1

3          2004 1

;

proc sort data=have;

  by panelist year;

run;

data want (drop=_:);

  set have;

  by Panelist;

  set have ( firstobs = 2 keep = year rename = (year = _year2) )

      have (      obs = 1 drop = _all_                        );

  _year2 = ifn(  last.Panelist, (.), _year2 );

  set have ( firstobs = 3 keep = year rename = (year = _year3) )

      have (      obs = 2 drop = _all_                        );

  _year3 = ifn(  last.Panelist, (.), _year3 );

  _prev1 = ifn( first.Panelist, (.), lag(year) );

  _prev2 = ifn( first.Panelist, (.), lag2(year) );

  if (year-2 eq _prev1-1 eq _prev2) or

     (year+2 eq _year2+1 eq _year3) or

     (year eq _year2-1 eq _prev1+1) then output;

run;

jkf91
Calcite | Level 5

Thank you! It worked! Smiley Happy

art297
Opal | Level 21

I will still be interested to see if anyone provides an easier solution, but I have found the look ahead look back method to be extremely handy and versitile for addressing such needs.

Patrick
Opal | Level 21

I can't really think of an easier solution than some kind of a "look-ahead". One could of course transpose stuff and then use array processing.

The one thing I would try to change is the need for a pass through the data and additional logic for every single consecutive year (eg: what if the OP would have asked for 6 consecutive years?).

Below a code variation which should work for any number of consecutive years:

proc delete data=work.want;run;

data have;
  input Panelist Year othervars;
cards;
1 2003 1
1 2004 1
1 2005 1
1 2006 1
1 2008 1
1 2009 1
2 2003 1
2 2004 1
2 2005 1
2 2007 1
2 2008 1
2 2009 1
3 2003 1
3 2004 1
4 2004 1
;
run;

proc sql;
  create table DistObs as
  select distinct panelist,year
  from have
  order by panelist,year
  ;
quit;

%let MinConsecutiveYears=3;
data YearSelection (keep=Panelist Year);

  do while(1=1);

    _pointerStart=sum(_pointerStart,1);
    _pointerEnd  =sum(_pointerStart,&MinConsecutiveYears,-1);

    if _pointerEnd>_nobs then stop;
 
    set DistObs (keep=Panelist Year rename=(Panelist=_PanelistStart Year=_YearStart)) point=_pointerStart nobs=_nobs;
    set DistObs (keep=Panelist Year rename=(Panelist=_PanelistStop Year=_YearStop)) point=_pointerEnd;

    if _PanelistStart=_PanelistStop and sum(_YearStart,&MinConsecutiveYears,-1)=_YearStop then
    do;
      _LagPanelist=_PanelistStart;
      _LagYear=_YearStart-1;
      do _pointerStart=_pointerStart to _nobs;
        set DistObs point=_pointerStart;
        if _LagPanelist ne Panelist or _LagYear ne Year-1 then leave;
        _LagPanelist=Panelist;
        _LagYear    =Year;
        output;
      end;
      _pointerStart=sum(_pointerStart,-1);
    end;

  end;

run;

proc sql;
/*  create table want as*/
  select l.*
    from have l, YearSelection r
    where l.panelist=r.panelist and l.year=r.year
    order by l.panelist, l.year
  ;
quit;

Tom
Super User Tom
Super User

Art -

   I think that it is not working at the end of the file.  Sort by descending Panelist.

  Here is a similar method with more checking for end of file condition.  But it does require that SAS is able to count incoming observations such when using a dataset as input instead of a view.

data have;

  input Panelist Year Othervars @@ ;

  cards;

1 2003 1 1 2004 1 1 2005 1 1 2006 1 1 2008 1 1 2009 1

2 2003 1 2 2004 1 2 2005 1 2 2007 1 2 2008 1 2 2009 1

3 2003 1 3 2004 1

;

data want ;

  set have nobs=nobs ;

  if (_n_ < nobs) then set

    have (firstobs=2 keep=panelist year rename=(panelist=_np year=_ny))

  ;

  if (_n_ < nobs - 1) then set

    have (firstobs=3 keep=panelist year rename=(panelist=_np2 year=_ny2))

  ;

  _pp = lag(panelist);

  _py = lag(year);

  _pp2= lag2(panelist);

  _py2= lag2(year);

  keep =

   (_pp2 = _pp = panelist) and (sum(_py2,2) = sum(_py,1)=  year)

  or (_pp = panelist = _np) and (sum(_py,1)=  year = sum(_ny,-1) )

  or (panelist = _np = _np2) and ( year = sum(_ny,-1) = sum(_ny2,-2) )

  ;

run;

proc print;

by panelist ;

id panelist;

var keep year othervars;

run;

Panelist    keep    Year    Othervars

    1         1     2003        1

              1     2004        1

              1     2005        1

              1     2006        1

              0     2008        1

              0     2009        1

    2         1     2003        1

              1     2004        1

              1     2005        1

              1     2007        1

              1     2008        1

              1     2009        1

    3         0     2003        1

              0     2004        1

art297
Opal | Level 21

Tom,

The method I proposed did work, I think.  Your code doesn't account for the specified criteria, namely the a record should ONLY be output if it is part of an adjacent three year sequence.

Art

Tom
Super User Tom
Super User

If I run it just on panelist 2 I only get 5 records. The last is lost.

I did not see the extra version of have getting pulled in with no variables.

On the second one you need change OBS=1 to OBS=2.

art297
Opal | Level 21

Tom,

You were absolutely correct!  I changed the code (i.e., making obs=2) in my original post and, hopefully, the OP has followed our latest exchange.

Art

Linlin
Lapis Lazuli | Level 10

Hi Art,

I think "obs=1" shoud be "obs=2" in your code.  

sethave ( firstobs = 3 keep = year rename =(year = _year3) )

      have (      obs = 1 drop= _all_                      );

you will see the difference if you add another obsevation

"Panelist=3 Year=2005" to the dataset.

Thank you!


art297
Opal | Level 21

Linlin, I think (hope) I made the correct change at the EXACT moment that you posted your response.

jkf91
Calcite | Level 5

thank you all. i got it revised. Smiley Happy

art297
Opal | Level 21

This would make an excellent SGF presentation .. too bad it is too late to submit for 2012.  Anyone planning on going to San Francisco in 2013?

Howles
Quartz | Level 8

Use a Double DoW after pre-processing to create an appropriate BY variable.

data grouped ;

set have ;

by panelist ;

groupnum + ( first.panelist OR ( dif(year) GT 1 ) ) ;

run ;

data want ;

do many = 1 by 1 until (last.groupnum) ;

   set grouped ;

   by groupnum ;

   end ;

do nn = 1 to many ;

   set grouped ;

   if many GE 3 then output ;

   end ;

run ;

art297
Opal | Level 21

Howard,

Nice solution!  I especially liked the preprocessing part.  Of course, following that, a fairly simple proc sql segment also would have sufficed. I.e.:

data grouped ;

  set have ;

  by panelist ;

  groupnum + ( first.panelist OR ( dif(year) GT 1 ) ) ;

run ;

proc sql;

  create table want (drop=groupnum) as

    select *

      from grouped

        group by groupnum

          having count(*) > 2

            order by panelist,year

  ;

quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 8890 views
  • 1 like
  • 9 in conversation