BookmarkSubscribeRSS Feed
mariopellegrini
Pyrite | Level 9

Good morning everyone. I'm trying to optimize some time consuming code, I'm asking for suggestions. Starting from a typical situation that I report in the example code below, it is a matter of identifying through sort + data step with the use of by of the records, I was wondering if with hash techniques there is the possibility of improving the processing time

 

data ds_1;
input cod1 cod2;
datalines;
1 4
1 12
1 7
2 7
2 6
2 9
3 12
3 4
;
proc sort data=ds_1;
by cod1 cod2;
run;

data ds_2;
set ds_1;
by cod1 cod2;
if last.cod1;
run;
3 REPLIES 3
mariopellegrini
Pyrite | Level 9

35,800,340 observations and 11 variables in total (8 variables in the "by")
The original date step lasts:
real time 5:38.72
cpu time 5:10.59

Kurt_Bremser
Super User

If your initial dataset is already sorted by cod1, you could avoid the PROC SORT by using a DOW loop:

data want;
do until (last.cod1);
  set have;
  by cod1;
  _cod2 = max(_cod2,cod2);
end;
do until (last.cod1);
  set have;
  by cod1;
  if cod2 = _cod2 then output;
end;
drop _cod2;
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 950 views
  • 0 likes
  • 2 in conversation