BookmarkSubscribeRSS Feed
DarrylLawrence
Obsidian | Level 7

I have two very large SAS datasets that have indexes on the most commonly used columns used in filters.

 

Dataset1

99,209,789 records
850 columns

 

Dataset2

758,618,653 records
375 columns

 

When I select records based on a text formatted indexed column, the results are returned in under 1 second.

 

When I select records based on a date formatted indexed column, the results are returned in about 1 minute 22 seconds.

 

When I select records based on a text formatted and the date formatted indexed column, the results are returned in under 1 second.

 

Does anyone know why the selection on the date column, that has been indexed, takes much longer (on its own) than when combined with a character indexed column or just a character indexed column.

 

Could it be the format of the date in the "where" clause?

Data credit_sasma;
set SASMAHIS.SASMA_ACCOUNT_HISTORY (where=(Obs_Date = '12JAN2023'd));
run;

 

Date format in the SAS Datasets:

3 OBS_DATE Num 8 DATE9. DATE9. OBS_DATE

 

See attached document for the code and the log.

 

6 REPLIES 6
s_lassen
Meteorite | Level 14

That is not so strange. The queries using the text column returns very few rows. The query with the date column alone returns 2.7 million records. That has to take more time.

yabwon
Onyx | Level 15

Could you run your code with the following options on:

options fullstimer msglevel=I;

so we could see details of processing times and information which indexes were used by SAS to process data.

 

To force SAS to use particular index you could try the IDXNAME= data set option. 

 

[EDIT:] One more thing, indexes are executed against raw data, so format won't affect processing.

You probably saw this article by Billy Clifford about SAS Indexes: https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/008-30.pdf

 

Bart

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



DarrylLawrence
Obsidian | Level 7

I added the index option and the options - options fullstimer msglevel=I; 

 

Log: No difference in the runtime  

28 Data credit_sasma;
29 set SASMAHIS.SASMA_ACCOUNT_HISTORY (idxname=obs_date where=(Obs_Date = '12JAN2023'd));
INFO: Index OBS_DATE selected for WHERE clause optimization.
30 run;

NOTE: There were 2789161 observations read from the data set SASMAHIS.SASMA_ACCOUNT_HISTORY.
WHERE Obs_Date='12JAN2023'D;
NOTE: The data set WORK.CREDIT_SASMA has 2789161 observations and 850 variables.
NOTE: Compressing data set WORK.CREDIT_SASMA decreased size by 67.65 percent.
Compressed is 60148 pages; un-compressed would require 185945 pages.
NOTE: DATA statement used (Total process time):
real time 1:20.43
user cpu time 1:12.15
system cpu time 7.75 seconds
memory 5980.75k
OS Memory 31400.00k
Timestamp 03/02/2023 12:16:33 PM
Step Count 17 Switch Count 133
Page Faults 2
Page Reclaims 1127
Page Swaps 0
Voluntary Context Switches 412
Involuntary Context Switches 535
Block Input Operations 12532969
Block Output Operations 15409176

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @DarrylLawrence 

 

It would be interesting to se the result of rerunning with a data _null_ step instead, so no time is used for compressing and writing output, just to see if the where clause is the real problem here.

ballardw
Super User

@ErikLund_Jensen wrote:

Hi @DarrylLawrence 

 

It would be interesting to se the result of rerunning with a data _null_ step instead, so no time is used for compressing and writing output, just to see if the where clause is the real problem here.


Or maybe just KEEPing 2 or 3 variables instead of 850.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 482 views
  • 3 likes
  • 6 in conversation