BookmarkSubscribeRSS Feed
CuriousCanDo
Calcite | Level 5

Hi, I have a SAS dataset with limited number of columns (~15). I usually and frequently have the need to search the relevant records from this dataset using Index() function. However due to size of dataset (>7gb, >25million records) it is taking very long. I created SAS Index on individual variable. Of course Index reduces time to retrieve records but it only works with an exact match in where clause, which is not what I need 99% of the time. 99% of the time I have to do keyword search (search not by full string value but part of string value, like _VAR Contains "searchtext") using Index function and in this case, as seen below, SAS doesn't use the available Index on the same variable. Can someone please help, point to relevant threads, material: I need to search (Index() or Find()) very fast and it seems SAS Index so far not helping.

 

26 data j__;
27 set adhc.codetrak (where=(program_name='t_hier_testing_diab_cv1'));
INFO: Index program_name selected for WHERE clause optimization.
28 run;

NOTE: There were 6404 observations read from the data set ADHC.CODETRAK.
WHERE program_name='t_hier_testing_diab_cv1';
NOTE: The data set WORK.J__ has 6404 observations and 16 variables.
NOTE: DATA statement used (Total process time):
real time 0.16 seconds
cpu time 0.03 seconds

 

 

27 data j__;
28 set adhc.codetrak (where=(index(lowcase(strip(program_name)), 't_hier_testing_diab_cv1')));
29 run;

NOTE: There were 6404 observations read from the data set ADHC.CODETRAK.
WHERE INDEX(LOWCASE(STRIP(program_name)), 't_hier_testing_diab_cv1');
NOTE: The data set WORK.J__ has 6404 observations and 16 variables.
NOTE: DATA statement used (Total process time):
real time 2:13.92
cpu time 43.62 seconds

4 REPLIES 4
ballardw
Super User

If you are searching for the same few terms then perhaps you should had indicator variables when those frequently searched for terms are present.

Or perhaps values have been consolidated when additional variables might be appropriate.
If you have multiple "program_name" values with pieces like: hier , testing, diab or Cv1 those might be candidates for the additional variables. Then search in one or more of those variables for an equality.

 

 

s_lassen
Meteorite | Level 14

If you are actually looking for records where the variable STARTS with 't_hier_testing_diab_cv1', you can probably get SAS to use the index for searching by using LIKE instead of the INDEX() function:

data j__;
  set adhc.codetrak (where=(program_name like 't_hier_testing_diab_cv1%'));
run;

But as you are also using the LOWCASE() function, that does not seem to be a possible solution.

yabwon
Onyx | Level 15

The use of the index() function "turns off"  use of index.

 

Imagine:

data have;
input text $;
cards;
ABC
BAC
BCA
BBC
BCC
CAB
;
run;


data want;
  set have;
  where index(text, "AB");
run;
proc print;
run;
proc print;
run;

data want;
  set have;
  where text =: "AB";
run;
proc print;
run;

The index() function search is like "see if a given value contains 'AB' text string I'm looking for" - so you have to look through ALL values you have in data.

 

The search by the index is like "phone book" search:  Mys string starts with "A" so lets go to the page for letter A, then it has "B" so lets move few pages to the place where there are words starting with "AB".

 

So if you want to use the index() function on a variable SAS will never use index...

 

Bart

 

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



AhmedAl_Attar
Ammonite | Level 13

Hi @CuriousCanDo 

It's faster and cheaper compute if you use either LOWCASE() or UPCASE() function to convert the case of the static string you searching for in order to match the stored text case in your data set. Single conversion instead of 25 Millions as it is in your case!

Using the msglevel=I option would indicate if an index being used for the WHERE clause evaluation.  I'll borrow the sample code @yabwon had in his reply for illustration

options msglevel=i fullstimer;

data have(index=(text));
	input text $;
	cards;
ABC
BAC
BCA
BBC
BCC
CAB
;
run;

%let srch_str = ab;

data want;
	set have;
	where index(lowcase(strip(text)), "&srch_str");
run;

proc print;
run;

/* Convert search string to upcase to match the stored data */
%let srch_str = %upcase(ab);

proc sql;
	create table want as
	select * from have
	where text like ("&srch_str%");
quit;

data want;
	set have;
	where text=: "&srch_str";
run;

proc print;
run;

Hope this helps

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 724 views
  • 2 likes
  • 5 in conversation