hi ... as Art said "nice" ... it reminded me of another one of your ideas, the self-interleave (once GROUPNUM is added to the data set) ...
data want (drop=groupnum);
do _n_ = 0 by 0 until (last.groupnum) ;
set grouped (in=one) grouped;
by groupnum;
_n_ + one;
if ^one and _n_ ge 3 then output;
end;
run;
If I understood what you mean.
data have; input Panelist Year Othervars @@ ; cards; 1 2003 1 1 2004 1 1 2005 1 1 2006 1 1 2008 1 1 2009 1 2 2003 1 2 2004 1 2 2005 1 2 2007 1 2 2008 1 2 2009 1 3 2003 1 3 2004 1 ; run; data have; set have; if dif(year) ne 1 or panelist ne lag(panelist) then group+1; run; proc sql ; create table want as select * from have group by group having count(*) ge 3 order by panelist,year; quit;
Ksharp
KSharp, two minor but additional simplifications that will reduce processing time:
data grouped;
set have;
groupnum+(dif(year) ne 1 or dif(panelist) ne 0);
run;
proc sql ;
create table want (drop=groupnum) as
select *
from grouped
group by groupnum
having count(*) > 2
order by panelist,year
;
quit;
Art.
Actually. Yours and mine are the same .
Ksharp
KSharp, Actually they aren't. Your post got me to question whether your approach would run any faster than the one that Howard had suggesed. To my surprise, it didn't.
I changed your line:
if dif(year) ne 1 or panelist ne lag(panelist) then group+1;
to: groupnum+(dif(year) ne 1 or dif(panelist) ne 0);
plus, I changed your line: having count(*) ge 3
to: having count(*) > 2
The combination of those two changes resulted in a 16% reduction in average needed cpu time.
Art .
The best way to reduce time is not to use ORDER clause.
Or use order by group,year . which both can save some time.
Ksharp
Hi I am a SAS layman (started using about 3 weeks ago)
I would like to perform a similar function to the problem addressed above.
I have 213000 observations (give or take) on firms with any annual fundamentals from 1970-2010.
Basically, I want to cut down this data. I'm hoping to obtain only the firms with consecutive fundamental data for the full 40 years. I thought I could use the perm number and the fiscal year of the dataset to do it. But I found the thread applied to a smaller dataset, can it still work for me? There are few contributions here, what are the best commands to use?
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.