BookmarkSubscribeRSS Feed
Peter_C
Rhodochrosite | Level 12
please can someone point me at some guidance on syntax that persuades PROC SQL to use an index when I only want a few rows and I can provide the keys in an in-list.
option msglevel=i ;
proc sql _method ; create table bits as
select * from &sup_hier where se_no in('9429512452', '9528002678', '9429504095' )
;
quit;
(did not use the index on SE_NO)

proc sql _method ; create table bit2 as
select * from &sup_hier where se_no in('9429512452')
union
select * from &sup_hier where se_no in('9528002678')
union
select * from &sup_hier where se_no in('9429504095' )
;
quit;
(did use the index)
Using the UNION construct must surely be "syntax overkill" here!

all offers welcome

evidence : snip from SASlog running SAS9.2 on aix-64, appears under the sig

peterC
[pre]19 option msglevel=i ;
20 proc sql _method ;
20 ! create table bits as
21 select * from &sup_hier where se_no in('9429512452', '9528002678', '9429504095' )
22 ;

NOTE: SQL execution methods chosen are:

sqxcrta
sqxsrc( EDWPARK.SE_HR_VOL_ROLL_UP )
INFO: Index index1 not used. Sorting into index order may help.
NOTE: Table WORK.BITS created, with 7 rows and 13 columns.

23 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:04.23
user cpu time 4.87 seconds
system cpu time 12.24 seconds


24 proc sql _method ;
24 ! create table bit2 as
25 select * from &sup_hier where se_no in('9429512452')
26 union
27 select * from &sup_hier where se_no in('9528002678')
28 union
29 select * from &sup_hier where se_no in('9429504095' )
30 ;

NOTE: SQL execution methods chosen are:

sqxcrta
sqxunqs
sqxuall
sqxuall
sqxsrc( EDWPARK.SE_HR_VOL_ROLL_UP )
sqxsrc( EDWPARK.SE_HR_VOL_ROLL_UP )
sqxsrc( EDWPARK.SE_HR_VOL_ROLL_UP )
INFO: Index index1 selected for WHERE clause optimization.
INFO: Index index1 selected for WHERE clause optimization.
INFO: Index index1 selected for WHERE clause optimization.
NOTE: Table WORK.BIT2 created, with 7 rows and 13 columns.

31 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.10 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
16 REPLIES 16
art297
Opal | Level 21
Peter,

You're findings appear to be consistent with Paul's explanation at:
http://support.sas.com/techsup/technote/ts320.html

Couldn't you get around the optimizer, though, by including idxname in your from clause? e.g., from whatever (idxname=nameofindex) ?

Art
Ksharp
Super User
Hi.
Peter.It is very interesting , when I use the following code ,Index is used totally.
How do you create index? Do you use OR operation to instead it?
[pre]
data class;
set sashelp.class;
run;
proc datasets library=work;
modify class;
index create name /unique;
quit;
options msglevel=i;
proc sql _method;
create table peter as
select *
from class
where name in ('Alice' 'Barbara' 'Judy')
;
quit;




42 modify class;
43 index create name /unique;
NOTE: Simple index Name has been defined.
44 quit;

NOTE: MODIFY was successful for WORK.CLASS.DATA.
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.03 seconds
cpu time 0.04 seconds


45 options msglevel=i;
46 proc sql _method;
47 create table peter as
48 select *
49 from class
50 where name in ('Alice' 'Barbara' 'Judy')
51 ;

NOTE: SQL execution methods chosen are:

sqxcrta
sqxsrc( WORK.CLASS )
INFO: Index Name selected for WHERE clause optimization.
NOTE: Table WORK.PETER created, with 3 rows and 5 columns.

52 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.03 seconds
cpu time 0.02 seconds
[/pre]

I also notice your index index1 ,is it simple index or composite index? If it were simple index ,it must has the same name with the variable.


Ksharp Message was edited by: Ksharp
Peter_C
Rhodochrosite | Level 12
will check (on site) whether my index is unique.
Could be because the table filesize is 4.5GB not counting the size of the index.
Could be because my data is not in index order.

peter
LinusH
Tourmaline | Level 20
I don't think none of this is the problem.
Since your table is quite big, and your result set is small, the index would be used, without regard to data sort order, or uniqueness.

The index1 is a composite index, SAS engines does nor allow single indexes with other names than the contributing column.
One guess could be that the se_no is not the primary column in the index, which is has to be to being chosen.
Looking at the no of distinct values of the contributing columns might also help to understand the situation.

/Linus
Data never sleeps
Peter_C
Rhodochrosite | Level 12
Linus
thank you for looking into this.

here are the index1 attributes[pre]Alphabetic List of Indexes and Attributes
# of
Unique
# Index Values Variables
1 index1 40932026 SE_NO SOURCE_SYS_ID [/pre]
the column of the in() in the where clause is the first column of the composite index[pre]where se_no in('9429512452', '9528002678', '9429504095' )[/pre]
There are 40M unique values
and the data are not sorted
INFO: Index index1 not used. Sorting into index order may help.

Oddly, that is no handicap for an equality filter, like
select * from &sup_hier where se_no in('9429512452')
as demo-ed earlier.

appreciate your input
Peter
LinusH
Tourmaline | Level 20
Well, it happens that I run into Proc SQL issues that makes me frustrated, i.e. as choosing index for joins or not. And many times you just have to accept that Proc SQL have some flaws.
Regarding your problem, I have no easy fix tip. I guess that you already have played with IDXWHERE and IDXNAME options?

Just for curiosity, what happens if you try an inner join approach? This could trigger a hash join (which is also quite unpredictable...).

/Linus
Data never sleeps
Peter_C
Rhodochrosite | Level 12
see later lack of checking before posting!!! better version later
Message was edited by: Peter.C
Peter_C
Rhodochrosite | Level 12
"curiosity killed the ..."[pre]20 data wanted ; * build in-list as inner join;
21 do want= '9429512452', '9528002678', '9429504095' ;
22 output ;
23 end;
24 run ;

NOTE: The data set WORK.WANTED has 3 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds


25 proc sql _method ;
25 ! create table bitsJ as
26 select * from &sup_hier join wanted on se_no= want ;

NOTE: SQL execution methods chosen are:

sqxcrta
sqxjhsh ========= NOTICE HASH JOIN
sqxsrc( EDWPARK.SE_HR_VOL_ROLL_UP )
sqxsrc( WORK.WANTED )
INFO: Index index1 not used. Sorting into index order may help.

NOTE: Table WORK.BITSJ created, with 7 rows and 14 columns.

27 *where se_no in('9429512452', '9528002678', '9429504095' )
28 ;
29 create table bitsJ as select * from &sup_hier(idxwhere=yes) join wanted on se_no= want ;

NOTE: SQL execution methods chosen are:

sqxcrta
sqxjhsh ========= STILL A HASH JOIN
sqxsrc( EDWPARK.SE_HR_VOL_ROLL_UP )
sqxsrc( WORK.WANTED )
INFO: Data set option (IDXWHERE=YES)forced an index to be used rather than a sequential pass for where-clause processing.
INFO: Index index1 selected for WHERE clause optimization.

NOTE: Table WORK.BITSJ created, with 7 rows and 14 columns. [/pre]
Although here, I can force the use of the index with the IDXWHERE=YES, it would probably cause an ERROR: if the index was not present.
The idea of SQL is to optimise for us, what-ever the conditions, so I cannot always use the IDXWHERE=YES option.

I remain very surprised that an "in-list" excludes the use of an index, even at data sizes like this!

peter
art297
Opal | Level 21
Peter,

You never mentioned whether you tried the idxname option. I can't test it at the moment but, according to the documentation: "Because the index SAS selects might not always provide the best optimization, you can direct SAS to use one of the candidate indexes by specifying the IDXNAME= data set option. If you specify an index that SAS does not identify as a candidate index, then IDXNAME= does not process the request. That is, IDXNAME= does not allow you to specify an index that would produce incorrect results."

I don't know if that results in an error or simply overrides the option if the index doesn't exist.

Art
Ksharp
Super User
Agree with Linus.Index is not cared with data' s order.Even if your data has not sorted,index also will promote the processing speed,Index is only logic sorted information not physical. May be you need create a simple index for that special variable.
But for your situation - 4.5G file, I do not think it a good idea to use proc sql.
You can use Hash Table or proc formate which will use binary search technology to highly speed your proc and save lots of time. You see?

Ksharp
LinusH
Tourmaline | Level 20
Creating a simple index might help, but I think a composite index should work fine given the information in this thread.
And I can't see any reason this is not a SQL type of query, unless there is a special flaw in SAS implementation of SQL in this particular scenario. SQL works fine for must "simple" queries.
If I'm not mistaken, the WHERE-clause optimizer shares it's logic with the data step and other Procs.

What happens if you use a corresponding data step?

/Linus
Data never sleeps
Ksharp
Super User
Yes.Linus.
What you said is all right. However,SQL is not really suitable for very large table like this.
SQL will very slowly or collapse your OS when table is very large table.So It is a advantage of SAS to process large table,SAS offer lots of tools to query data Such as Hash Table ,proc format ..... these are very efficient, when you have a large table need to process.That is the reason why I love SAS .

I have to leave now.
Have a good Day! Linus.
:-)

Ksharp
Peter_C
Rhodochrosite | Level 12
Ksharp

thank you for taking a look

I adapted your example until it produces similar behaviour to my experience with 4.5GB[pre]19 data class;
20 set sashelp.class;
21 do until( rap > 99999 );
22 rap = 100000* ranuni(1) ;
23 nam1= put( rap,z1.)!!name ;[/pre]* I want a large table but almost unique rows ;[pre]24 output ;
25 end;
26 run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS has 1260317 observations and 7 variables.
NOTE: At least one W.D format was too small for the number to be printed. The decimal may be shifted by the "BEST" format.
NOTE: DATA statement used (Total process time):
real time 3.77 seconds
cpu time 0.67 seconds


27 proc sort out= classI( index=( key=( nam1 rap ))) ;[/pre] * my shortcut for building a composite index on data ordered by something else;[pre]28 by weight ;
29 run ;

NOTE: There were 1260317 observations read from the data set WORK.CLASS.
NOTE: SAS threaded sort was used.
NOTE: The data set WORK.CLASSI has 1260317 observations and 7 variables.
INFO: Multiple concurrent threads will be used to create the index.
NOTE: Composite index key has been defined.
NOTE: PROCEDURE SORT used (Total process time):
real time 4.55 seconds
cpu time 3.11 seconds


30 proc contents varnum;
31 run;

NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds


32 options msglevel=i;
33 proc sql _method stimer ;
NOTE: SQL Statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

34 create table Kpeter as select * from classI(idxwhere=yes) where nam1 in ('1Alice' '1Barbara' '1Judy') ;

NOTE: SQL execution methods chosen are:

sqxcrta
sqxsrc( WORK.CLASSI )
INFO: Data set option (IDXWHERE=YES)forced an index to be used rather than a sequential pass for where-clause processing.
INFO: Index key selected for WHERE clause optimization.
NOTE: Table WORK.KPETER created, with 5 rows and 7 columns.

NOTE: SQL Statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds

35 create table Kpeter as select * from class
I where nam1 in ('1Alice' '1Barbara' '1Judy') ;

NOTE: SQL execution methods chosen are:

sqxcrta
sqxsrc( WORK.CLASSI )
INFO: Index key not used. Sorting into index order may help.
NOTE: Table WORK.KPETER created, with 5 rows and 7 columns.

NOTE: SQL Statement used (Total process time):
real time 0.58 seconds
cpu time 0.36 seconds

36 quit;
NOTE: PROCEDURE SQL used (Total process time):[/pre]Without forcing use of the index, the query takes half | one second. With IDXWHERE=YES, the query time was 1 | 2 (cpu | elapse) hundredths of a second.

If you wish to adapt the test, the clean code follows

regards
peterC

data class;
set sashelp.class ;
do until( rap > 99999 ) ;
rap = 100000* ranuni(1) ;
nam1= put( rap,z1.)!!name ;
output ;
end ;
run ;
proc sort out= classI( index=( key=( nam1 rap ))) ;
by weight ;
run ;
proc contents varnum ;
run;
options msglevel=i;
proc sql _method stimer ;
create table Kpeter as select * from classI(idxwhere=yes) where nam1 in ('1Alice' '1Barbara' '1Judy') ;
create table Kpeter as select * from classI where nam1 in ('1Alice' '1Barbara' '1Judy') ;
quit ;
Ksharp
Super User
Peter.
You are welcome.
It looks like composite index does not work for your situation,Only simple index can do. I also learned it one more time.

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 4409 views
  • 0 likes
  • 4 in conversation