we are using WRS 4.4 and I have a table with 11.400.000 observations and 568 columns.
I hoped to increase report speed when adding an index. So I created one in meta data with for example column 3, column 5 and column 7. The index physically exists.
The report shows round about 10 - 15 columns in a list, sums up a few columns and has a where clause with 5 filter conditions (off course they contain the 3 columns of the index).
At the end the report does not load faster. I already tried to preorder the data before writing the table but that did not help. In map studio it is not visible if he uses the index or not.
Any Ideas how to speed up with an index?
An index will only benefit you if you extract a sufficiently small (think 10% or less) subset of your data. Otherwise, the overhead of reading the index file actually increases time.
Keep in mind that a SAS dataset is organized in pages, and that a whole page (128K) needs to be read anyway if a single observation from it is needed.
There are some performance suggestions in the documentation:
Four tips to remember when you contact SAS Technical Support
Creating Beautiful Reports
SAS Visual Analytics Learning Center
It's been awhile since I worked with WRS and Information maps.
The key is to somehow capture the queries that the information map generates towards the Pooled WS server.
Then take those queries and run it outside WRS, with
As you might know it's important how the index is defined vs the query is written, e.g., the first column in the index must be part of the where clause.
If you wish to speed up the queries, you might want to consider moving the data to a SPDE libref for faster table scans, and slightly more advanced where clause optimizer.
Those 568 columns look suspicious to me. In all my professional work, I had datasets with such numbers only for regression analysis, but never for reporting.
Consider stripping down or restructuring your dataset (if there's data in structure, see Maxim 19).
Thank you. Sorting Data and 3 (from over 500) columns in index should be much smaller that reading the data source.
Moreover you are right. Having 568 columns is rubbish and I'm already working on that.
Thank you. I will read it.
renamed the data file to get an exception with the sql. Seems although I have 5 filters the sql does not contain a where clause. So this would be a reason why index is not working. Does it makes sense that he first load all data in a temp data set to filter afterwards??
Again, I'm a litle rusty.
Not sure what you mean by creating a temporary table? You don't refer to the query cache...?
In case yo can't get an indexed search, you want to restructure your data, as @Kurt_Bremser points out. If you have several reports with with not much overlapping information, you could consider splitting your data set in to multiple.
For where clause, I'm not sure how the information map acts if the filters is only defined in WRS (the documentation leaves a lot to wish for when comes to query tuning). If it works for your use case you could try to define filters in the information map, and have them prompted in the WRS report?
Yes, the index file will be much smaller than the dataset file. But if the index doesnt't keep you from still reading most, if not all, of the dataset file itself, it only adds additional I/O. And if your "want" observations are scattered randomly across the dataset pages, this will happen even with a small subset.
If you also have the MDDB server at hand, consider creating and using a cube.
You're right, it's OLAP. At my age, long-term memory tends to override short-term 😉
Thank you for your feedback.
We also use the olap feature but in this case it is prefered to have a relational table.
Splitting Data is the idea for the future. I search for a fast solution while we are working on that.
Based on your idea I created a chapter filter (hope it is the correct description, I dont have a English WRS) and now I can see that the SQL has an where clause. Unfortunately the where clause contains a strip command that prevents using the index.
WHERE (STRIP( ( table0.myColumn ) )) IN ( '1D' )
Any ideas how to avoid that strip command? I guess then I reached by goal and he uses the index
And I guess this is SQL created by the information map, over which you have no control. I also suggest to get in contact with SAS TS.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.