Exploring, predicting and reporting with SAS Visual Analytics and SAS Visual Statistics

SAS VA performance question? (...and SAS data set indexes)

Accepted Solution Solved
Reply
Highlighted
Frequent Contributor
Posts: 129
Accepted Solution

SAS VA performance question? (...and SAS data set indexes)

[ Edited ]

Hello all,

I am very green to SAS VA, but have what I hope is a simple question, If I want to auto load data via a SAS Dataset, does it help, hurt or make a bit of difference to index the data set in the first place before even placing it in my Auto load folder?  TIA   PS [if] it matters at this time we have a dataset that is 4.5 Million obs with 184 fields and growing by 1.1+ mil / year average. I have done many of the indexes on the tables fields to see if they even meet the recommendations of placing an index on a field. So do some don't, but have not yet looked critically if it even makes sense per field.

 


Accepted Solutions
Solution
‎02-14-2018 11:34 AM
Super Contributor
Posts: 368

Re: SAS VA performance question? (...and SAS data set indexes)

Posted in reply to kjohnsonm

I'd say it doesn't help!

When loading a table with indexes you will get the message that indexes are not handled Smiley Happy

//Fredrik

View solution in original post


All Replies
Solution
‎02-14-2018 11:34 AM
Super Contributor
Posts: 368

Re: SAS VA performance question? (...and SAS data set indexes)

Posted in reply to kjohnsonm

I'd say it doesn't help!

When loading a table with indexes you will get the message that indexes are not handled Smiley Happy

//Fredrik

SAS Super FREQ
Posts: 557

Re: SAS VA performance question? (...and SAS data set indexes)

Posted in reply to kjohnsonm

According to the user's guide, you cannot import indexed SAS data sets. I assume that is a general rule, not just for imports. You can compress data, but that conserves memory at the expense of performance.

 

http://support.sas.com/documentation/cdl/en/vaag/69958/HTML/default/viewer.htm#p0ehlxuj60aw6qn161pr8...

 

 

Super Contributor
Posts: 368

Re: SAS VA performance question? (...and SAS data set indexes)

Posted in reply to Madelyn_SAS

I do a proc copy and get this message, table is loaded Smiley Happy

 

100 proc copy in=tolib out=zzz
101 select zzzz;
103 quit;

NOTE: Copying zzzz to zzzz (memtype=DATA).
WARNING: Indexes for zzzz.DATA cannot be created.
WARNING: Engine SASIOLA does not support index create operations.
NOTE: There were 192005 observations read from the data set aaaa.
NOTE: The data set aaaa has 192005 observations and 27 variables.
NOTE: PROCEDURE COPY used (Total process time):

 

//Fredrik

Frequent Contributor
Posts: 129

Re: SAS VA performance question? (...and SAS data set indexes)

Posted in reply to Madelyn_SAS

Thank you, I had the admin guide. I did not see it in there, the user guide sure does.

 

Page23. of http://support.sas.com/documentation/cdl/en/vaug/69957/PDF/default/vaug.pdf

 

"Usage Notes
Review the following notes if you have trouble importing data: n Before you click OK to import the data, click Preview. Preview shows an accurate representation of the column names and data values that will be available after the import. n If SAS is configured as a Unicode server at your site, then you have the most flexibility for importing data. Specifically, SAS as a Unicode server helps with using column names or filenames (that are used as table names) that have double-byte characters. n When you import a delimited text file (CSV file), you must specify the encoding of the text file. In some cases, the import reports success, even though the data might be corrupted. It is important to verify the imported data. n If you import a SAS data set that uses user-defined formats, then you must ensure that the custom format catalog is available to the SAS Application Server. For more information, see “Working with User-Defined Formats” on page 39. n If importing large data files at the same time is common for your deployment, then you should be aware that large data files are written to temporary disk space on the server. In extreme cases, this can cause temporary disk space to become full. Systems that run out of disk space can become unresponsive and difficult to troubleshoot. n If you import data from text files and plan to append the data, then you must verify that the column data types and lengths match the table that you want to append to. n When you import data, a SAS LASR Analytic Server does not maintain preexisting sort orders. You must re-sort the data after you import it. n Importing indexed SAS data sets is not supported."

Frequent Contributor
Posts: 129

Re: SAS VA performance question? (...and SAS data set indexes)

Posted in reply to kjohnsonm
I guess this means I need to hunt down a more complete list of SAS provided VA documents 8] ...another saga.
Trusted Advisor
Posts: 1,737

Re: SAS VA performance question? (...and SAS data set indexes)

Posted in reply to kjohnsonm

Hello @kjohnsonm,

 

regarding indexes, I have nothing left to say, @Madelyn_SAS@FredrikE and yourself found the answers.

 

Now, I understand you want to improve performance on loading your datasets.

 

First, yes, extend knowledge through documentation will help, definetely:

Finally, if you have a distributed LASR/VA, there are many ways to improve the performance of the data load: managing the blocksize on your hadoop cluster, the co-location configuration (and disks), with some extra layers such as Hive, and more

 

http://support.sas.com/resources/papers/proceedings16/11206-2016.pdf

https://support.sas.com/content/dam/SAS/support/en/documentation/third-party-reference/sas-hadoop-pe...

https://communities.sas.com/t5/SAS-Communities-Library/SAS-with-Hadoop-performances-Best-practices-a...

Frequent Contributor
Posts: 129

Re: SAS VA performance question? (...and SAS data set indexes)

Posted in reply to JuanS_OCS
Hello Juan, Thanks for the post. No we are not "distributed LASR/VA"
"Have you tried AutoLoad?" yes we have. We saw links in the communities about user’s auto loading DB's directly. We were hoping to follow in those footsteps. We understand now that is also not supported so we are switching to the SAS data set rout. Our DB is changed only a few times a year, It’s not a real-time ERP DB. changing randomly any hour of the day via web input but a snapshot based data warehouse that is updated on our Schools census day, midterm, end of term, X3 for each term from data in the ERP. Thus since the DB is so stable we were hoping to avoid having to have a middle step and connect directly. We are way to small to need Hadoop. and likely cannot afford it.
Trusted Advisor
Posts: 1,737

Re: SAS VA performance question? (...and SAS data set indexes)

Posted in reply to kjohnsonm

Hello @kjohnsonm,

 

so your data is coming from a Database, I see. If you look for performance, you better get the data you specifically need from the DB into filtered SAS tables, and then you can load or autoload them with VA. Probably you would like to have a SAS DI or EG code to make the filtered transformation. Preferably, not from the VA server but another SAS server, data management related.

 

The main reason is because VA will always get the full tables from your DB, create the full table from the DBs into SAS tables in Work, and then query them for the VA reports. Hence, the only way to avoid this overkill in disk, network and time, is to create the required SAS tables or datamart yourself.

 

I hope it helps,


Kind regards,

Juan

Frequent Contributor
Posts: 129

Re: SAS VA performance question? (...and SAS data set indexes)

Posted in reply to JuanS_OCS
"Probably you would like to have a SAS DI or EG code to make the filtered transformation. Preferably, not from the VA server but another SAS server, data management related." -- thanks for the pointer, I will read up on this.
Super User
Posts: 3,853

Re: SAS VA performance question? (...and SAS data set indexes)

[ Edited ]
Posted in reply to kjohnsonm

Regarding VA performance. In my experience I have found:

 

  • Dataset compression tends to slow performance and in general I haven't found it useful
  • If your tables contain long standard text descriptions with low cardinality, then converting these into codes linked to user-defined formats can really save space and significantly improve performance
  • If your VA reporting requires only summary-level data, then you can gain big performance improvements by using multiple summary tables customised for your reporting rather than one big detail-level table
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 487 views
  • 9 likes
  • 5 in conversation