Hi,
I have a table with huge volume of data (approx 500 million records) and I am using SQL Join transformation ( and doing aggregate functions, group by and having in clause). I have also created Index(on the columns that are used in the having clause) on the output table( by right clicking on the temporary output table).
But it is taking ages to exeute. Can someone help me here.
Thanks
First, how many columns are we talking about here? If you've got columns that are not necessary to calculations or the final results, eliminate them from the SELECT.
Second, can you post the code? It's pretty hard to guess at what you're doing here.
Jim
Hi @jimbarbour
I have only 13 columns after deleting the unwanted columns.
Sample Code:
Block1:
Proc sql;
Create table T1 as select * from Original
where date>2010 and itemcategory not in ('A','B','C');
quit;
Block2:
Proc sql;
Create table T2 as select name, area, designation, sum(salary) as Totalsum
from T1;
Quit;
Block3:
Proc Sql;
Create table T3 as select * from T2
where Totalsum>1000;
Quit;
The above one is the sample code.
Please suggest.
Thanks
@VVDR wrote:
Hi @jimbarbour
I have only 13 columns after deleting the unwanted columns.
Sample Code:
Block1:
Proc sql;
Create table T1 as select * from Original
where date>2010 and itemcategory not in ('A','B','C');
quit;
Block2:
Proc sql;
Create table T2 as select name, area, designation, sum(salary) as Totalsum
from T1;
Quit;
Block3:
Proc Sql;
Create table T3 as select * from T2
where Totalsum>1000;
Quit;
The above one is the sample code.
Please suggest.
Thanks
In the code you have posted only datasets in work are used. Please don't post a simplified version of the code you want to be optimized.
This code does not make sense. In block 2, you create the sum of salary over the whole dataset and attach it to every single observation of the dataset. So if that sum is larger than 1000, you will still get all observations in block 3; if it's smaller, you get none.
Please describe what you actually want to achieve.
Without seeing the code and the log it is hardly possible to suggest something useful.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.