BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
novinosrin
Tourmaline | Level 20

@FreelanceReinh  Thank you for making this discussion  very interactive and interesting. One more favor as I'm afraid we(students at Depaul) are not allowed to play with large datasets at our college lab. May i request a test on the performance on this, similar to DOW, a self imposed interleave

 

data file;
input pat_id drug $ ;
cards;
3 648 2
3 009 2
4 504 2
4 569 2
5 003 1
;
data want;
do until(last);
set file(in=a) file(in=b) end=last;
by pat_id ;
if first.pat_id then call missing(count);
if a then count+1;
if b then output;
end;
run;
FreelanceReinh
Jade | Level 19

@novinosrin,

 

Very nice! I don't think I've seen this trick before. (Will add it to my SAS notes.)

 

Run time was 13 s (+/- 0.5 s) in a few repeated runs, i.e. rank 2 in the "competition".

 

 

novinosrin
Tourmaline | Level 20

@FreelanceReinh Thank you sir as always the "sir" goes to the special few 🙂 mentioned before lol 🙂

Cruise
Ammonite | Level 13
I keep staring at this solution. 3 dimensional it is.
novinosrin
Tourmaline | Level 20

@Cruise  Fun to find different ways to solve 🙂 Take care!

Cruise
Ammonite | Level 13

Nice. It worked out fine for my actual dataset with 8 million observations. I forgot to sort my data first on ID variable.

 

NOTE: There were 8827728 observations read from the data set L.P02_16.

NOTE: There were 8827728 observations read from the data set L.P02_16.

NOTE: The data set L.P02_16A has 8827728 observations and 8 variables.

NOTE: DATA statement used (Total process time):

real time 45.35 seconds

cpu time 2.37 seconds

FreelanceReinh
Jade | Level 19

Great. Just to get back to your initial post: The "PROC FREQ + MERGE" technique you mentioned took 35 s on my machine (using the "competition" dataset, of course). Replacing PROC FREQ with PROC SUMMARY (I had thought the BY statement might be beneficial) was even worse: 45 s.

Kurt_Bremser
Super User

@novinosrin wrote:

@FreelanceReinh  Thank you for making this discussion  very interactive and interesting. One more favor as I'm afraid we(students at Depaul) are not allowed to play with large datasets at our college lab. May i request a test on the performance on this, similar to DOW, a self imposed interleave

 

data file;
input pat_id drug $ ;
cards;
3 648 2
3 009 2
4 504 2
4 569 2
5 003 1
;
data want;
do until(last);
set file(in=a) file(in=b) end=last;
by pat_id ;
if first.pat_id then call missing(count);
if a then count+1;
if b then output;
end;
run;

I'm actually surprised that it took slightly longer than the dow loop. I would have expected almost identical times, as the mechanism is the same (read block from first handle, then read block from second handle and output). What happens if call missing is replaced with count = 0; (simple assignment may be faster than subroutine call)?

FreelanceReinh
Jade | Level 19

@Kurt_Bremser,

Good point. I've just run the variant with "count=0" three times, but the run times were always slightly longer (!) than the maximum run time of three runs of the "call missing(count)" version. But it's very close.

 

Another six runs of the double DOW loop (three with your "newcount+1", three with "count=sum(count,1)", thus saving the reset to zero at the beginning of the data step) resulted in five shorter run times (no clear preference for one of the variants). On average it seems that the double DOW loop leads by approx. 0.5 seconds or approx. 4 percent. (@novinosrin's approach ran a bit faster than it did yesterday.)

 

BTW, congrats on your 10,000th post! Remarkable achievement!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 24 replies
  • 2226 views
  • 8 likes
  • 6 in conversation