DATA Step, Macro, Functions and more

Question about using indexes in SAS data steps

Reply
Frequent Contributor
Posts: 93

Question about using indexes in SAS data steps

I am trying to work with SAS index in order to get better performance getting data from SAS datasets.

proc sql;

create index campo3 on prueba (campo3);

quit;

I didn't get better performance in a query which where clause has only one field and this is the field of the index (campo3).

I would need some advices about index use.

I have read these:

- Use index in fileds with no many different values

- Use index when the result of the sql is less than 20% oh whole table

- ¿¿¿????

Any advice about using SAS index to get better perfomance will be greatly appreciatted.

  Thanks in advance,

Super User
Posts: 6,963

Re: Question about using indexes in SAS data steps

An index can help you greatly if it is able to reduce the overall I/O needed.

It is not a replacement for sorting a dataset if you plan on using it completely (in this case you read the whole index AND the whole dataset, and the whole dataset randomly, so your performance is actually decreased)

Another point: if it is to be expected that each page in the dataset (or most of the pages, anyway) holds at least one record that matches your where condition, then the index cannot help, as you still need to read all pages and therefore the whole dataset.

Which query did you want to do, and have you already tried using sort and a data step?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 93

Re: Question about using indexes in SAS data steps

Thanks Kurt,

Above the code I am using trying to learn how to use indexes.

I don' want to do sorts because my input datasets is large.

How can I see and manage 'pages'?, I don't know how to....

/* This is my dataset */

data prueba(drop = i);

do i = 1 to 50000000;

  campo1='abc';

  campo2 = i;

  campo3 = round(ranuni(1)*10);

  campo4 = round(ranuni(100), 0.01);

  campo5 = date();

  format campo5 yymmddn8.;

  output;

end;

run;

/* This is the filter I am doing, usin data/set or a proc sql. I need that this filter goes very quick */

data prueba1;

set prueba;

where campo3 = 5;

run;

proc sql;

create table prueba1 as

(select * from prueba

where campo3 = 5);

quit;

Frequent Contributor
Posts: 93

Re: Question about using indexes in SAS data steps

Thanks Kurt,

Above the code I am using trying to learn how to use indexes.

I don' want to do sorts because my input datasets is large.

How can I see and manage 'pages'?, I don't know how to....

/* This is my dataset */

data prueba(drop = i);

do i = 1 to 50000000;

  campo1='abc';

  campo2 = i;

  campo3 = round(ranuni(1)*10);

  campo4 = round(ranuni(100), 0.01);

  campo5 = date();

  format campo5 yymmddn8.;

  output;

end;

run;

/* This is the filter I am doing, usin data/set or a proc sql. I need that this filter goes very quick */

data prueba1;

set prueba;

where campo3 = 5;

run;

proc sql;

create table prueba1 as

(select * from prueba

where campo3 = 5);

quit;

Super User
Posts: 6,963

Re: Question about using indexes in SAS data steps

See, a SAS dataset is organized in "pages" that hold several records(observations); the number of observations per page can be determined with proc contents and depends on the observation size. The page size is dependent on operating system and file system parameters, and the observation size.

To get a specific observation, SAS always needs to read the whole page physically.

In your example, you create a number between 0 and 10 (where 1 to 9 will each occur in 10%, and 0 and 10 in 5% of the observations).

Given that each page holds, say, 100 observations, the probability that each discrete value of campo3 is present in ALL pages is, for all purposes, 1. Which means that SAS will have to physically read all pages of the dataset anyway, so it is best to not use an index, but read the dataset sequentially.

I just created a sample of your dataset an my server (SAS 9.2, AIX - 64 bit), with 10000 obs.

dataset page size: 8192

observation size: 40

max obs per page: 203

As you see, you WILL have all values of campo3 present in all pages.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 93

Re: Question about using indexes in SAS data steps

Thanks Kurt

Its really helpful,

Can you please give an example in which the creation and use of indexes take advantage??

Thsnks un avance,

Super User
Posts: 5,260

Re: Question about using indexes in SAS data steps

There are literary tons of papers on the issue. And if you wish to look at some more official, see the index section in the SAS Language Concepts at support.sas.com.

Data never sleeps
Super User
Posts: 6,963

Re: Question about using indexes in SAS data steps

Basically, everytime the index can be used to extract a small fraction of the whole dataset, with a high probabilty that I/O can be reduced. ie one single date out of one or more years.

Or to find/update single observations, although this will rarely happen in SAS.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 93

Re: Question about using indexes in SAS data steps

Hi Kurt,

I give you an example with a composite index that it's noy using the index in the data setp and I don't undetstannd why..

Thanks in advance.

----

%let fecha_ini = %sysevalf(%sysfunc(date()) - 900);

%let fecha2 = %sysfunc(date());
%let fecha1 = %sysevalf(%sysfunc(date()) - 10);
%let producto1 = P100;
%let producto2 = P1000;

%put fecha_ini = &fecha_ini;

data ventas(drop=i);
do i=1 to 10000000;
fecha = &fecha_ini + round(900*ranuni(1));
cod_producto = compress('P'||round(ranuni(1)*10000));
hc_importe = round(ranuni(1)*1000);
format fecha ddmmyyn8.;
output;
end;
run;


options msglevel=i;


proc contents data=ventas;

run;

/* index */

proc sql;
create index i_ventas on ventas (fecha, cod_producto);
quit;


data salida;

set ventas;

where fecha = &fecha1 and cod_producto = "&producto1";
run;

Super User
Posts: 5,260

Re: Question about using indexes in SAS data steps

A log would help to prove your case... ;-)

30   data salida;

31       set ventas;

32       where fecha = &fecha1 and cod_producto = "&producto1";

INFO: Index i_ventas selected for WHERE clause optimization.

33   run;

Data never sleeps
Ask a Question
Discussion stats
  • 9 replies
  • 351 views
  • 3 likes
  • 3 in conversation