Solved: consecutive observations only - Page 2

MikeZdeb · Posted 01-15-2012 03:02 PM

hi ... as Art said "nice" ... it reminded me of another one of your ideas, the self-interleave (once GROUPNUM is added to the data set) ...

data want (drop=groupnum);

do _n_ = 0 by 0 until (last.groupnum) ;

set grouped (in=one) grouped;

by groupnum;

_n_ + one;

if ^one and _n_ ge 3 then output;

end;

run;

Ksharp · Posted 01-15-2012 10:46 PM

If I understood what you mean.

data have;
  input Panelist Year Othervars @@ ;
  cards;
1 2003 1 1 2004 1 1 2005 1 1 2006 1 1 2008 1 1 2009 1
2 2003 1 2 2004 1 2 2005 1 2 2007 1 2 2008 1 2 2009 1
3 2003 1 3 2004 1
;
run;
data have;
 set have;
 if dif(year) ne 1 or panelist ne lag(panelist) then group+1;
run;
proc sql ;
create table want as
 select * 
  from have
   group by group
    having count(*) ge 3
     order by panelist,year;
quit;

Ksharp

art297 · Posted 01-15-2012 11:27 PM

KSharp, two minor but additional simplifications that will reduce processing time:

data grouped;

set have;

groupnum+(dif(year) ne 1 or dif(panelist) ne 0);

run;

proc sql ;

create table want (drop=groupnum) as

select *

from grouped

group by groupnum

having count(*) > 2

order by panelist,year

;

quit;

Ksharp · Posted 01-16-2012 01:03 AM

Art.

Actually. Yours and mine are the same .

Ksharp

art297 · Posted 01-16-2012 08:48 AM

KSharp, Actually they aren't. Your post got me to question whether your approach would run any faster than the one that Howard had suggesed. To my surprise, it didn't.

I changed your line:

 if dif(year) ne 1 or panelist ne lag(panelist) then group+1;

to: groupnum+(dif(year) ne 1 or dif(panelist) ne 0);

plus, I changed your line: having count(*) ge 3

to: having count(*) > 2

The combination of those two changes resulted in a 16% reduction in average needed cpu time.

Ksharp · Posted 01-17-2012 01:23 AM

Art .

The best way to reduce time is not to use ORDER clause.

Or use order by group,year . which both can save some time.

Ksharp

spraynardz90 · Posted 04-24-2012 05:51 AM

Hi I am a SAS layman (started using about 3 weeks ago)

I would like to perform a similar function to the problem addressed above.

I have 213000 observations (give or take) on firms with any annual fundamentals from 1970-2010.

Basically, I want to cut down this data. I'm hoping to obtain only the firms with consecutive fundamental data for the full 40 years. I thought I could use the perm number and the fiscal year of the dataset to do it. But I found the thread applied to a smaller dataset, can it still work for me? There are few contributions here, what are the best commands to use?

Re: consecutive observations only

Re: consecutive observations only

Re: consecutive observations only

Re: consecutive observations only

Re: consecutive observations only

Re: consecutive observations only

Re: consecutive observations only

Re: consecutive observations only

Re: consecutive observations only

Re: consecutive observations only

Re: consecutive observations only

Re: consecutive observations only

Re: consecutive observations only

Re: consecutive observations only

Registration is open

SAS Training: Just a Click Away