BookmarkSubscribeRSS Feed
juanvg1972
Pyrite | Level 9

I am trying to work with SAS index in order to get better performance getting data from SAS datasets.

proc sql;

create index campo3 on prueba (campo3);

quit;

I didn't get better performance in a query which where clause has only one field and this is the field of the index (campo3).

I would need some advices about index use.

I have read these:

- Use index in fileds with no many different values

- Use index when the result of the sql is less than 20% oh whole table

- ¿¿¿????

Any advice about using SAS index to get better perfomance will be greatly appreciatted.

  Thanks in advance,

9 REPLIES 9
Kurt_Bremser
Super User

An index can help you greatly if it is able to reduce the overall I/O needed.

It is not a replacement for sorting a dataset if you plan on using it completely (in this case you read the whole index AND the whole dataset, and the whole dataset randomly, so your performance is actually decreased)

Another point: if it is to be expected that each page in the dataset (or most of the pages, anyway) holds at least one record that matches your where condition, then the index cannot help, as you still need to read all pages and therefore the whole dataset.

Which query did you want to do, and have you already tried using sort and a data step?

juanvg1972
Pyrite | Level 9

Thanks Kurt,

Above the code I am using trying to learn how to use indexes.

I don' want to do sorts because my input datasets is large.

How can I see and manage 'pages'?, I don't know how to....

/* This is my dataset */

data prueba(drop = i);

do i = 1 to 50000000;

  campo1='abc';

  campo2 = i;

  campo3 = round(ranuni(1)*10);

  campo4 = round(ranuni(100), 0.01);

  campo5 = date();

  format campo5 yymmddn8.;

  output;

end;

run;

/* This is the filter I am doing, usin data/set or a proc sql. I need that this filter goes very quick */

data prueba1;

set prueba;

where campo3 = 5;

run;

proc sql;

create table prueba1 as

(select * from prueba

where campo3 = 5);

quit;

juanvg1972
Pyrite | Level 9

Thanks Kurt,

Above the code I am using trying to learn how to use indexes.

I don' want to do sorts because my input datasets is large.

How can I see and manage 'pages'?, I don't know how to....

/* This is my dataset */

data prueba(drop = i);

do i = 1 to 50000000;

  campo1='abc';

  campo2 = i;

  campo3 = round(ranuni(1)*10);

  campo4 = round(ranuni(100), 0.01);

  campo5 = date();

  format campo5 yymmddn8.;

  output;

end;

run;

/* This is the filter I am doing, usin data/set or a proc sql. I need that this filter goes very quick */

data prueba1;

set prueba;

where campo3 = 5;

run;

proc sql;

create table prueba1 as

(select * from prueba

where campo3 = 5);

quit;

Kurt_Bremser
Super User

See, a SAS dataset is organized in "pages" that hold several records(observations); the number of observations per page can be determined with proc contents and depends on the observation size. The page size is dependent on operating system and file system parameters, and the observation size.

To get a specific observation, SAS always needs to read the whole page physically.

In your example, you create a number between 0 and 10 (where 1 to 9 will each occur in 10%, and 0 and 10 in 5% of the observations).

Given that each page holds, say, 100 observations, the probability that each discrete value of campo3 is present in ALL pages is, for all purposes, 1. Which means that SAS will have to physically read all pages of the dataset anyway, so it is best to not use an index, but read the dataset sequentially.

I just created a sample of your dataset an my server (SAS 9.2, AIX - 64 bit), with 10000 obs.

dataset page size: 8192

observation size: 40

max obs per page: 203

As you see, you WILL have all values of campo3 present in all pages.

juanvg1972
Pyrite | Level 9

Thanks Kurt

Its really helpful,

Can you please give an example in which the creation and use of indexes take advantage??

Thsnks un avance,

LinusH
Tourmaline | Level 20

There are literary tons of papers on the issue. And if you wish to look at some more official, see the index section in the SAS Language Concepts at support.sas.com.

Data never sleeps
Kurt_Bremser
Super User

Basically, everytime the index can be used to extract a small fraction of the whole dataset, with a high probabilty that I/O can be reduced. ie one single date out of one or more years.

Or to find/update single observations, although this will rarely happen in SAS.

juanvg1972
Pyrite | Level 9

Hi Kurt,

I give you an example with a composite index that it's noy using the index in the data setp and I don't undetstannd why..

Thanks in advance.

----

%let fecha_ini = %sysevalf(%sysfunc(date()) - 900);

%let fecha2 = %sysfunc(date());
%let fecha1 = %sysevalf(%sysfunc(date()) - 10);
%let producto1 = P100;
%let producto2 = P1000;

%put fecha_ini = &fecha_ini;

data ventas(drop=i);
do i=1 to 10000000;
fecha = &fecha_ini + round(900*ranuni(1));
cod_producto = compress('P'||round(ranuni(1)*10000));
hc_importe = round(ranuni(1)*1000);
format fecha ddmmyyn8.;
output;
end;
run;


options msglevel=i;


proc contents data=ventas;

run;

/* index */

proc sql;
create index i_ventas on ventas (fecha, cod_producto);
quit;


data salida;

set ventas;

where fecha = &fecha1 and cod_producto = "&producto1";
run;

LinusH
Tourmaline | Level 20

A log would help to prove your case... 😉

30   data salida;

31       set ventas;

32       where fecha = &fecha1 and cod_producto = "&producto1";

INFO: Index i_ventas selected for WHERE clause optimization.

33   run;

Data never sleeps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1216 views
  • 3 likes
  • 3 in conversation