BookmarkSubscribeRSS Feed
mariopellegrini
Pyrite | Level 9

Good morning everyone. I'm trying to optimize some time consuming code, I'm asking for suggestions. Starting from a typical situation that I report in the example code below, it is a matter of identifying through sort + data step with the use of by of the records, I was wondering if with hash techniques there is the possibility of improving the processing time

 

data ds_1;
input cod1 cod2;
datalines;
1 4
1 12
1 7
2 7
2 6
2 9
3 12
3 4
;
proc sort data=ds_1;
by cod1 cod2;
run;

data ds_2;
set ds_1;
by cod1 cod2;
if last.cod1;
run;
3 REPLIES 3
mariopellegrini
Pyrite | Level 9

35,800,340 observations and 11 variables in total (8 variables in the "by")
The original date step lasts:
real time 5:38.72
cpu time 5:10.59

Kurt_Bremser
Super User

If your initial dataset is already sorted by cod1, you could avoid the PROC SORT by using a DOW loop:

data want;
do until (last.cod1);
  set have;
  by cod1;
  _cod2 = max(_cod2,cod2);
end;
do until (last.cod1);
  set have;
  by cod1;
  if cod2 = _cod2 then output;
end;
drop _cod2;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 259 views
  • 0 likes
  • 2 in conversation