BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello

I have a data set with 25 million rows and 15 columns.

I am using proc sql to calculate aggregation statistics .

The query is running very slowly (It takes long time to complete run the query).

My question- May anyone suggest another code that run quickly?


proc sql;
create table want as
select  Z1,Z2,YYYYMM,
       count(distinct customer_ID) as nr_customers,
count(*) as nr_rows, sum(amount)/1000000 as amount_in_Millions from have group by Z1,Z2,YYYYMM ; quit;

 

3 REPLIES 3
LinusH
Tourmaline | Level 20

When dealing with optimization, a string point is to add

options msglevel=i stimer;

And provide the log.

To your code, to better understand what the potential bottlenecks.

When ti comes to you code, it's not much you can about it.

You could try PROC SUMMARY instead, it's quicker in some instances.

Assuming your data is in Base, state if this is not the case.

Make sure you can use as much RAM as possible, and potentially increasing MEMSIZE snd SORTSIZE.

 

Data never sleeps
PaigeMiller
Diamond | Level 26

Try PROC SUMMARY. I would strongly advise you abandon the idea of doing lots of statistics across lots of groups using PROC SQL, especially for HUGE data sets. PROC SUMMARY ought to run faster than PROC SQL in this task. Just because you can do it in PROC SQL does not mean you should do it in PROC SQL. 

 

proc summary data=have nway;
    class z1 z2 yyyymm customer_id;
    var amount;
    output out=sums sum= n=n;
run;

proc summary data=sums nway;
    class z1 z2 yyyymm;
    var amount n;
    output out=want sum(amount n)=amount nr_rows;
run;

 

In the output data set WANT, the value of _FREQ_ is the number of unique customers for each grouping of Z1 Z2 YYYYMM.

--
Paige Miller
Kurt_Bremser
Super User

Try a DATA step method:

proc sort
  data=have (keep=z1 z2 yyyymm customer_id amount)
  out=int
;
by z1 z2 yyyymm customer_id;
run;

data want;
set int;
by z1 z2 yyyymm customer_id;
retain nr_rows amount_in_millions;
if first.yyyymm
then do;
  nr_rows = 0;
  amount_in_millions = 0;
end;
if first.customer_id then nr_rows + 1;
amount_in_millions + amount;
if last.yyyymm
then do;
  amount_in_millions = amount_in_millions / 1000000;
  output;
end;
drop amount customer_id;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 567 views
  • 3 likes
  • 4 in conversation