BookmarkSubscribeRSS Feed
Learn_SAS1
Calcite | Level 5

Hi,

I am currently using SAS Data step. 

Data test1; /* loading approx 17 lakhs records */

   Set source (keep=var1 var2 var3); /* source data set has 322596778 obs */

where DATEPART(START_from_dtm)="31JAN2020"D <=datepart(START_to_dtm);

run;

The above data step is running for 25 minutes. Is there any solution to execute the query faster.

START_from_dtm has index on it. So tried by keeping only START_from_dtm column but same time.

I am using the same query multiple times and the execution is increasing. 

Thanks in advance. 

10 REPLIES 10
ChrisNZ
Tourmaline | Level 20

Do your datetimes have a time, or is the time just 00:00:00?

Is your source table indexed or sorted?

How many observations do you have, and how many different dates?

Learn_SAS1
Calcite | Level 5

effective_from_dtm has values like DDMONYYYY:00:00:00.000.

As I have observed most of the values has  00 as time.

Index, I have mentioned in my original post, effective_from_dtm has index and no other indexes. 

I can't create Index on that table as it is owned by other teams. 

That data set is not sorted.

Thanks 

SASKiwi
PROC Star

This might be a little more efficient:

where START_from_dtm = "31JAN2020:00:00:00"DT <= START_to_dtm
ChrisNZ
Tourmaline | Level 20

@SASKiwi 's syntax will make all he difference in the world:

- No function to slow down the filtering

- The index will be used

Edit: No index. I confused the variable names, sorry.

Still, using no function will make a significant difference.

Learn_SAS1
Calcite | Level 5

Hi Saskiwi,

I have used that one too. but it hasn't changed much in the performance time.

 

Thanks

ChrisNZ
Tourmaline | Level 20

I have used that one too. but it hasn't changed much in the performance time.

Well you need to help your self here. We still haven't seen the actual code (the code posted can't work), or the log or know the row length.

Review the questions asked and answer them.

SASKiwi
PROC Star

@Learn_SAS1  - As already suggested by others please post your complete SAS log including the notes about run times.

ScottBass
Rhodochrosite | Level 12

@Learn_SAS1 wrote:

effective_from_dtm has values like DDMONYYYY:00:00:00.000.

As I have observed most of the values has  00 as time.

Index, I have mentioned in my original post, effective_from_dtm has index and no other indexes. 

I can't create Index on that table as it is owned by other teams. 

That data set is not sorted.

Thanks 


You've got a source dataset with 322,596,778 records, and need to filter on start_from_dtm <= some_static_date <= start_to_dtm.  Typical SCD2 type filtering.

 

Yet your source dataset doesn't have an index on that column.

 

Have you have explained the performance issue to your "other teams", requested an index be created on that column, and they refused?

 

That's going to be your best approach to improving the performance of this query.  Google how indexes work if you need further details.

 

Do you have the disk space to (as a proof-of-concept) create a copy of that dataset, create your own index on start_from_dtm start_to_dtm, then test the performance results?

 

If you do, you may also want to test performance if the data is stored using the SPDE engine.


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
andreas_lds
Jade | Level 19

@Learn_SAS1 wrote:

Hi,

I am currently using SAS Data step. 

Data test1; /* loading approx 17 lakhs records */

   Set source (keep=var1 var2 var3); /* source data set has 322.596.778 obs */

where DATEPART(START_from_dtm)="31JAN2020"D <=datepart(START_to_dtm);

run;

The above data step is running for 25 minutes. Is there any solution to execute the query faster.

START_from_dtm has index on it. So tried by keeping only START_from_dtm column but same time.

I am using the same query multiple times and the execution is increasing. 

Thanks in advance. 


In your original-code the variable used in the where-statement are also in the keep-list, right?

Reading a dataset with 332 million obs needs some time. My before-coffee-calculation showed, that about 200000 obs are processed per second, that is not that bad at all.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2810 views
  • 4 likes
  • 6 in conversation