Solved: Re: SAS VA performance question? (...and SAS data set indexes)

kjohnsonm · Posted 02-13-2018 11:24 AM

Hello all,

I am very green to SAS VA, but have what I hope is a simple question, If I want to auto load data via a SAS Dataset, does it help, hurt or make a bit of difference to index the data set in the first place before even placing it in my Auto load folder? TIA PS [if] it matters at this time we have a dataset that is 4.5 Million obs with 184 fields and growing by 1.1+ mil / year average. I have done many of the indexes on the tables fields to see if they even meet the recommendations of placing an index on a field. So do some don't, but have not yet looked critically if it even makes sense per field.

FredrikE · Posted 02-14-2018 07:15 AM

I'd say it doesn't help!

When loading a table with indexes you will get the message that indexes are not handled 🙂

//Fredrik

View solution in original post

FredrikE · Posted 02-14-2018 07:15 AM

I'd say it doesn't help!

When loading a table with indexes you will get the message that indexes are not handled 🙂

//Fredrik

Madelyn_SAS · Posted 02-14-2018 09:48 AM

According to the user's guide, you cannot import indexed SAS data sets. I assume that is a general rule, not just for imports. You can compress data, but that conserves memory at the expense of performance.

http://support.sas.com/documentation/cdl/en/vaag/69958/HTML/default/viewer.htm#p0ehlxuj60aw6qn161pr8...

-------------------------------------------------------------------------
Four tips to remember when you contact SAS Technical Support
Tricks for SAS Visual Analytics Report Builders
SAS Visual Analytics Learning Center

FredrikE · Posted 02-14-2018 10:23 AM

I do a proc copy and get this message, table is loaded 🙂

100 proc copy in=tolib out=zzz
101 select zzzz;
103 quit;

NOTE: Copying zzzz to zzzz (memtype=DATA).
WARNING: Indexes for zzzz.DATA cannot be created.
WARNING: Engine SASIOLA does not support index create operations.
NOTE: There were 192005 observations read from the data set aaaa.
NOTE: The data set aaaa has 192005 observations and 27 variables.
NOTE: PROCEDURE COPY used (Total process time):

//Fredrik

kjohnsonm · Posted 02-14-2018 11:33 AM

Thank you, I had the admin guide. I did not see it in there, the user guide sure does.

Page23. of http://support.sas.com/documentation/cdl/en/vaug/69957/PDF/default/vaug.pdf

"Usage Notes
Review the following notes if you have trouble importing data: n Before you click OK to import the data, click Preview. Preview shows an accurate representation of the column names and data values that will be available after the import. n If SAS is configured as a Unicode server at your site, then you have the most flexibility for importing data. Specifically, SAS as a Unicode server helps with using column names or filenames (that are used as table names) that have double-byte characters. n When you import a delimited text file (CSV file), you must specify the encoding of the text file. In some cases, the import reports success, even though the data might be corrupted. It is important to verify the imported data. n If you import a SAS data set that uses user-defined formats, then you must ensure that the custom format catalog is available to the SAS Application Server. For more information, see “Working with User-Defined Formats” on page 39. n If importing large data files at the same time is common for your deployment, then you should be aware that large data files are written to temporary disk space on the server. In extreme cases, this can cause temporary disk space to become full. Systems that run out of disk space can become unresponsive and difficult to troubleshoot. n If you import data from text files and plan to append the data, then you must verify that the column data types and lengths match the table that you want to append to. n When you import data, a SAS LASR Analytic Server does not maintain preexisting sort orders. You must re-sort the data after you import it. n Importing indexed SAS data sets is not supported."

kjohnsonm · Posted 02-14-2018 11:37 AM

I guess this means I need to hunt down a more complete list of SAS provided VA documents 8] ...another saga.

JuanS_OCS · Posted 02-14-2018 11:56 AM

Hello @kjohnsonm,

regarding indexes, I have nothing left to say, @Madelyn_SAS, @FredrikE and yourself found the answers.

Now, I understand you want to improve performance on loading your datasets.

First, yes, extend knowledge through documentation will help, definetely:

http://support.sas.com/resources/papers/proceedings17/1010-2017.pdf a great paper provided by Zencos.
http://support.sas.com/documentation/cdl/en/vaag/69958/HTML/default/viewer.htm#p0ehlxuj60aw6qn161pr8... some info about data compression and ways to upload data. Have you tried AutoLoad?
http://support.sas.com/documentation/cdl/en/vaug/68648/HTML/default/viewer.htm#p13ri5amhwvhkzn0zmipx...

Finally, if you have a distributed LASR/VA, there are many ways to improve the performance of the data load: managing the blocksize on your hadoop cluster, the co-location configuration (and disks), with some extra layers such as Hive, and more

http://support.sas.com/resources/papers/proceedings16/11206-2016.pdf

https://support.sas.com/content/dam/SAS/support/en/documentation/third-party-reference/sas-hadoop-pe...

https://communities.sas.com/t5/SAS-Communities-Library/SAS-with-Hadoop-performances-Best-practices-a...

kjohnsonm · Posted 02-14-2018 12:08 PM

Hello Juan, Thanks for the post. No we are not "distributed LASR/VA"
"Have you tried AutoLoad?" yes we have. We saw links in the communities about user’s auto loading DB's directly. We were hoping to follow in those footsteps. We understand now that is also not supported so we are switching to the SAS data set rout. Our DB is changed only a few times a year, It’s not a real-time ERP DB. changing randomly any hour of the day via web input but a snapshot based data warehouse that is updated on our Schools census day, midterm, end of term, X3 for each term from data in the ERP. Thus since the DB is so stable we were hoping to avoid having to have a middle step and connect directly. We are way to small to need Hadoop. and likely cannot afford it.

JuanS_OCS · Posted 02-15-2018 08:38 AM

Hello @kjohnsonm,

so your data is coming from a Database, I see. If you look for performance, you better get the data you specifically need from the DB into filtered SAS tables, and then you can load or autoload them with VA. Probably you would like to have a SAS DI or EG code to make the filtered transformation. Preferably, not from the VA server but another SAS server, data management related.

The main reason is because VA will always get the full tables from your DB, create the full table from the DBs into SAS tables in Work, and then query them for the VA reports. Hence, the only way to avoid this overkill in disk, network and time, is to create the required SAS tables or datamart yourself.

I hope it helps,

Kind regards,

Juan

kjohnsonm · Posted 02-23-2018 02:24 PM

"Probably you would like to have a SAS DI or EG code to make the filtered transformation. Preferably, not from the VA server but another SAS server, data management related." -- thanks for the pointer, I will read up on this.

SASKiwi · Posted 02-15-2018 07:07 PM

Regarding VA performance. In my experience I have found:

Dataset compression tends to slow performance and in general I haven't found it useful
If your tables contain long standard text descriptions with low cardinality, then converting these into codes linked to user-defined formats can really save space and significantly improve performance
If your VA reporting requires only summary-level data, then you can gain big performance improvements by using multiple summary tables customised for your reporting rather than one big detail-level table

Registration is open