How to create index before loading the data into the Output table

VVDR · Posted 09-22-2020 09:39 PM

Hi,

I have a table with huge volume of data (approx 500 million records) and I am using SQL Join transformation ( and doing aggregate functions, group by and having in clause). I have also created Index(on the columns that are used in the having clause) on the output table( by right clicking on the temporary output table).

But it is taking ages to exeute. Can someone help me here.

Thanks

jimbarbour · Posted 09-22-2020 10:43 PM

First, how many columns are we talking about here? If you've got columns that are not necessary to calculations or the final results, eliminate them from the SELECT.

Second, can you post the code? It's pretty hard to guess at what you're doing here.

Jim

VVDR · Posted 09-23-2020 12:42 AM

Hi @jimbarbour

I have only 13 columns after deleting the unwanted columns.

Sample Code:

Block1:

Proc sql;

Create table T1 as select * from Original

where date>2010 and itemcategory not in ('A','B','C');

quit;

Block2:

Proc sql;

Create table T2 as select name, area, designation, sum(salary) as Totalsum

from T1;

Quit;

Block3:

Proc Sql;

Create table T3 as select * from T2

where Totalsum>1000;

Quit;

The above one is the sample code.

Please suggest.

Thanks

andreas_lds · Posted 09-23-2020 12:58 AM

@VVDR wrote:

Hi @jimbarbour

I have only 13 columns after deleting the unwanted columns.

Sample Code:

Block1:

Proc sql;

Create table T1 as select * from Original

where date>2010 and itemcategory not in ('A','B','C');

quit;

Block2:

Proc sql;

Create table T2 as select name, area, designation, sum(salary) as Totalsum

from T1;

Quit;

Block3:

Proc Sql;

Create table T3 as select * from T2

where Totalsum>1000;

Quit;

The above one is the sample code.

Please suggest.

Thanks

In the code you have posted only datasets in work are used. Please don't post a simplified version of the code you want to be optimized.

Kurt_Bremser · Posted 09-23-2020 01:52 AM

This code does not make sense. In block 2, you create the sum of salary over the whole dataset and attach it to every single observation of the dataset. So if that sum is larger than 1000, you will still get all observations in block 3; if it's smaller, you get none.

Please describe what you actually want to achieve.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

andreas_lds · Posted 09-23-2020 12:55 AM

Without seeing the code and the log it is hardly possible to suggest something useful.

How to create index before loading the data into the Output table

Re: How to create index before loading the data into the Output table

Re: How to create index before loading the data into the Output table

Re: How to create index before loading the data into the Output table

Re: How to create index before loading the data into the Output table

Re: How to create index before loading the data into the Output table

How to create index before loading the data into the Output table

Re: How to create index before loading the data into the Output table

Re: How to create index before loading the data into the Output table

Re: How to create index before loading the data into the Output table

Re: How to create index before loading the data into the Output table

Re: How to create index before loading the data into the Output table

Registration is open

SAS Training: Just a Click Away