BookmarkSubscribeRSS Feed
Learn_SAS1
Calcite | Level 5

Hi,

I am currently using SAS Data step. 

Data test1; /* loading approx 17 lakhs records */

   Set source (keep=var1 var2 var3); /* source data set has 322596778 obs */

where DATEPART(START_from_dtm)="31JAN2020"D <=datepart(START_to_dtm);

run;

The above data step is running for 25 minutes. Is there any solution to execute the query faster.

START_from_dtm has index on it. So tried by keeping only START_from_dtm column but same time.

I am using the same query multiple times and the execution is increasing. 

Thanks in advance. 

10 REPLIES 10
ChrisNZ
Tourmaline | Level 20

Do your datetimes have a time, or is the time just 00:00:00?

Is your source table indexed or sorted?

How many observations do you have, and how many different dates?

Learn_SAS1
Calcite | Level 5

effective_from_dtm has values like DDMONYYYY:00:00:00.000.

As I have observed most of the values has  00 as time.

Index, I have mentioned in my original post, effective_from_dtm has index and no other indexes. 

I can't create Index on that table as it is owned by other teams. 

That data set is not sorted.

Thanks 

SASKiwi
PROC Star

This might be a little more efficient:

where START_from_dtm = "31JAN2020:00:00:00"DT <= START_to_dtm
ChrisNZ
Tourmaline | Level 20

@SASKiwi 's syntax will make all he difference in the world:

- No function to slow down the filtering

- The index will be used

Edit: No index. I confused the variable names, sorry.

Still, using no function will make a significant difference.

Learn_SAS1
Calcite | Level 5

Hi Saskiwi,

I have used that one too. but it hasn't changed much in the performance time.

 

Thanks

ChrisNZ
Tourmaline | Level 20

I have used that one too. but it hasn't changed much in the performance time.

Well you need to help your self here. We still haven't seen the actual code (the code posted can't work), or the log or know the row length.

Review the questions asked and answer them.

SASKiwi
PROC Star

@Learn_SAS1  - As already suggested by others please post your complete SAS log including the notes about run times.

ScottBass
Rhodochrosite | Level 12

@Learn_SAS1 wrote:

effective_from_dtm has values like DDMONYYYY:00:00:00.000.

As I have observed most of the values has  00 as time.

Index, I have mentioned in my original post, effective_from_dtm has index and no other indexes. 

I can't create Index on that table as it is owned by other teams. 

That data set is not sorted.

Thanks 


You've got a source dataset with 322,596,778 records, and need to filter on start_from_dtm <= some_static_date <= start_to_dtm.  Typical SCD2 type filtering.

 

Yet your source dataset doesn't have an index on that column.

 

Have you have explained the performance issue to your "other teams", requested an index be created on that column, and they refused?

 

That's going to be your best approach to improving the performance of this query.  Google how indexes work if you need further details.

 

Do you have the disk space to (as a proof-of-concept) create a copy of that dataset, create your own index on start_from_dtm start_to_dtm, then test the performance results?

 

If you do, you may also want to test performance if the data is stored using the SPDE engine.


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
andreas_lds
Jade | Level 19

@Learn_SAS1 wrote:

Hi,

I am currently using SAS Data step. 

Data test1; /* loading approx 17 lakhs records */

   Set source (keep=var1 var2 var3); /* source data set has 322.596.778 obs */

where DATEPART(START_from_dtm)="31JAN2020"D <=datepart(START_to_dtm);

run;

The above data step is running for 25 minutes. Is there any solution to execute the query faster.

START_from_dtm has index on it. So tried by keeping only START_from_dtm column but same time.

I am using the same query multiple times and the execution is increasing. 

Thanks in advance. 


In your original-code the variable used in the where-statement are also in the keep-list, right?

Reading a dataset with 332 million obs needs some time. My before-coffee-calculation showed, that about 200000 obs are processed per second, that is not that bad at all.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1534 views
  • 4 likes
  • 6 in conversation