BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MikeZdeb
Rhodochrosite | Level 12

hi ... as Art said "nice"  ... it reminded me of another one of your ideas, the self-interleave (once GROUPNUM is added to the data set) ...

data want (drop=groupnum);

do _n_ = 0 by 0 until (last.groupnum) ;

  set grouped (in=one) grouped;

  by groupnum;

  _n_ + one;

  if ^one and _n_ ge 3 then output;

end;

run;

Ksharp
Super User

If I understood what you mean.

data have;
  input Panelist Year Othervars @@ ;
  cards;
1 2003 1 1 2004 1 1 2005 1 1 2006 1 1 2008 1 1 2009 1
2 2003 1 2 2004 1 2 2005 1 2 2007 1 2 2008 1 2 2009 1
3 2003 1 3 2004 1
;
run;
data have;
 set have;
 if dif(year) ne 1 or panelist ne lag(panelist) then group+1;
run;
proc sql ;
create table want as
 select * 
  from have
   group by group
    having count(*) ge 3
     order by panelist,year;
quit;

Ksharp

art297
Opal | Level 21

KSharp, two minor but additional simplifications that will reduce processing time:

data grouped;

  set have;

  groupnum+(dif(year) ne 1 or dif(panelist) ne 0);

run;

proc sql ;

  create table want (drop=groupnum) as

    select *

      from grouped

        group by groupnum

          having count(*) > 2

            order by panelist,year

  ;

quit;

Ksharp
Super User

Art.

Actually. Yours and mine are the same .

Ksharp

art297
Opal | Level 21

KSharp, Actually they aren't.  Your post got me to question whether your approach would run any faster than the one that Howard had suggesed.  To my surprise, it didn't.

I changed your line:

 if dif(year) ne 1 or panelist ne lag(panelist) then group+1;

to:    groupnum+(dif(year) ne 1 or dif(panelist) ne 0);

plus, I changed your line: having count(*) ge 3

to: having count(*) > 2

The combination of those two changes resulted in a 16% reduction in average needed cpu time.

Ksharp
Super User

Art .

The best way to reduce time is not to use ORDER clause.

Or use order by group,year  . which both can save some time.

Ksharp

spraynardz90
Calcite | Level 5

Hi I am a SAS layman (started using about 3 weeks ago)

I would like to perform a similar function to the problem addressed above.

I have 213000 observations (give or take) on firms with any annual fundamentals from 1970-2010.

Basically, I want to cut down this data. I'm hoping to obtain only the firms with consecutive fundamental data for the full 40 years.  I thought I could use the perm number and the fiscal year of the dataset to do it. But I found the thread applied to a smaller dataset, can it still work for me? There are few contributions here, what are the best commands to use?  

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 9034 views
  • 1 like
  • 9 in conversation