BookmarkSubscribeRSS Feed
harlananelson
Fluorite | Level 6

I looked around for an answer to this question.  If I have one index called ENROLID for a very large data set or 10 indexes, one of which is ENROLID, does the query where I select a list of ENROLID values from the data set with one index differ in speed from the same query on the data set with 9 other indexed fields?

 

Do I need to pick and choose my indexes carefully, or should I throw in every possible index I might need?

 

 

11 REPLIES 11
PGStats
Opal | Level 21

I would worry more about the cost of creating and maintaining the indexes. Each operation updating, inserting, or deleting a record must also adjust the indexes.  

PG
harlananelson
Fluorite | Level 6

These get updated once per month.

harlananelson
Fluorite | Level 6

I am worried about the query because it is taking hours.  I only need one index, so if it would be faster with only one, that is what I want to have.

PGStats
Opal | Level 21

In my opinion, if your query is handled by SAS/SQL, the number of indexes is unlikely to be the problem. If this is about a Pass-Through query, then optimization is up to your DBMS.

 

PG
TomKari
Onyx | Level 15

1. SAS dataset or DBMS table?

 

2. How many rows?

 

3. How many fields and bytes per record?

 

4. How complex is the query?

 

I very much doubt the existence of the indexes is causing your problems.

harlananelson
Fluorite | Level 6
Re: Does the speed of a query decrease with the number of indexes? 
 
 
 

1. SAS dataset or DBMS table?

I am using the SAS Scalable Performance Data Engine.

 

2. How many rows?

Some examples for the files I am accessing.

1,794,533,270

5,080,553,136

1,602,072,953

 

Largest 10,125,182,001

 

3. How many fields and bytes per record?

 largest Data

Observations 10125182001

Data Partsize 53477375K

There are three parts, so multiply by three

That looks like 15 bytes per record.

 

4. How complex is the query?

 

I use DS2.  Create a hash out of my lookup table which has one field: the index field, then load the larger table and output if index value is found.

 

f = h.find();
if (f = 0) then output;

 

proc ds2;
data cpt_enrolid (overwrite=yes);
dcl double rc f enrolid;
declare char(7) proc1;
declare char(8) cpt;
dcl package hash h(8, '{select distinct proc1 from work.cpt}');
method init();
rc = h.keys([proc1]);
rc = h.data([proc1]);
rc = h.defineDone();
end;
method run();
set {select enrolid, proc1, proctype from DWSC1116.CEMR_O_2012_2016};
f = h.find();
if (f = 0) then output;
end;
enddata;
run;
quit;
PGStats
Opal | Level 21

There is no database index used in your procedure. My guess is that most of the time is spent reading the whole data table, which might be wasteful, depending on the size of work.cpt.

 

If the large table is indexed on proc1, the simple query

 

create table cpt_enrolid as

select enrolid, proc1, proctype from DWSC1116.CEMR_O_2012_2016
where proc1 in (select proc1 from work.cpt);

 

might execute faster.

 

hth.

PG
harlananelson
Fluorite | Level 6

I appreciate your suggestion and used it.  The speed did increase.  However I noticed something odd.  When I use a sub query in the where statement as you suggest, the speed is not as fast as when I take your idea one step further and use a simple string instead of the sub query.  This is not what I had expected.  But I will start using the SAS macro facility to replace the sub query with a text string on my queries.  Three minutes vs three hours to get the same 243570 observations. 

 

Is this a SAS bug or am I missing something?

 

243570 rows and 3 columns.
1584
1585  proc sql _method;
1586  create table test as
1587  select enrolid, proc1, proctyp from DWSC1116.CEMR_O_2012_2016 where
1588  PROC1 in ('64615')
1589  ;

NOTE: SQL execution methods chosen are:

      sqxcrta
          sqxsrc( DWSC1116.CEMR_O_2012_2016 )

whinit: WHERE (PROC1='64615')
whinit: INDEX PROC1 uses 13% of segs (WITHIN maxsegratio 75%)
whinit returns: ALL EVAL1(w/SEGLIST)
NOTE: Table WORK.TEST created, with 243570 rows and 3 columns.

1590
1591  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           3:31.14
      user cpu time       1.51 seconds
      system cpu time     10.01 seconds
      memory              161854.62k
      OS Memory           191904.00k
      Timestamp           07/25/2018 02:45:25 PM
      Step Count                        79  Switch Count  1


1592
1593  proc sql _method;
1594  create table test as
1595  select enrolid, proc1, proctyp from DWSC1116.CEMR_O_2012_2016 where
1596  PROC1 in (select distinct PROC1 from search_results)
1597  ;

NOTE: SQL execution methods chosen are:

      sqxcrta
          sqxfil
              sqxsrc( DWSC1116.CEMR_O_2012_2016 )

NOTE: SQL subquery execution methods chosen are:

          sqxsubq
              sqxunqs
                  sqxsrc( WORK.SEARCH_RESULTS )
NOTE: Table WORK.TEST created, with 243570 rows and 3 columns.

1598
1599  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           1:11:16.42
      user cpu time       13:11.53
      system cpu time     5:04.06
      memory              6516.34k
      OS Memory           36260.00k
      Timestamp           07/25/2018 03:58:42 PM
      Step Count                        80  Switch Count  0


 

PGStats
Opal | Level 21

When your query says

 

select enrolid, proc1, proctyp from DWSC1116.CEMR_O_2012_2016 where PROC1 in ('64615')

 

and your table is indexed on PROC1, SAS knows where those records are in your large table and only needs to read those. The time required becomes proportional to the size of the result table and not the size of the database table.

PG
harlananelson
Fluorite | Level 6

SAS should be able to use the index in the other query that uses the subquery. 

TomKari
Onyx | Level 15

I agree with @PGStats. Also, indexes tend to be more useful on very diverse fields. An index on "Customer Number", where you frequently have to look up individual numbers, can be fantastically useful. An index on "Gender" or "State", where there are a large number of repeated values, tends to be pretty much useless.

 

Tom

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 1054 views
  • 0 likes
  • 3 in conversation