hi ... as Art said "nice" ... it reminded me of another one of your ideas, the self-interleave (once GROUPNUM is added to the data set) ...
data want (drop=groupnum);
do _n_ = 0 by 0 until (last.groupnum) ;
set grouped (in=one) grouped;
by groupnum;
_n_ + one;
if ^one and _n_ ge 3 then output;
end;
run;
If I understood what you mean.
data have; input Panelist Year Othervars @@ ; cards; 1 2003 1 1 2004 1 1 2005 1 1 2006 1 1 2008 1 1 2009 1 2 2003 1 2 2004 1 2 2005 1 2 2007 1 2 2008 1 2 2009 1 3 2003 1 3 2004 1 ; run; data have; set have; if dif(year) ne 1 or panelist ne lag(panelist) then group+1; run; proc sql ; create table want as select * from have group by group having count(*) ge 3 order by panelist,year; quit;
Ksharp
KSharp, two minor but additional simplifications that will reduce processing time:
data grouped;
set have;
groupnum+(dif(year) ne 1 or dif(panelist) ne 0);
run;
proc sql ;
create table want (drop=groupnum) as
select *
from grouped
group by groupnum
having count(*) > 2
order by panelist,year
;
quit;
Art.
Actually. Yours and mine are the same .
Ksharp
KSharp, Actually they aren't. Your post got me to question whether your approach would run any faster than the one that Howard had suggesed. To my surprise, it didn't.
I changed your line:
if dif(year) ne 1 or panelist ne lag(panelist) then group+1;
to: groupnum+(dif(year) ne 1 or dif(panelist) ne 0);
plus, I changed your line: having count(*) ge 3
to: having count(*) > 2
The combination of those two changes resulted in a 16% reduction in average needed cpu time.
Art .
The best way to reduce time is not to use ORDER clause.
Or use order by group,year . which both can save some time.
Ksharp
Hi I am a SAS layman (started using about 3 weeks ago)
I would like to perform a similar function to the problem addressed above.
I have 213000 observations (give or take) on firms with any annual fundamentals from 1970-2010.
Basically, I want to cut down this data. I'm hoping to obtain only the firms with consecutive fundamental data for the full 40 years. I thought I could use the perm number and the fiscal year of the dataset to do it. But I found the thread applied to a smaller dataset, can it still work for me? There are few contributions here, what are the best commands to use?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.