02-13-2018 11:24 AM - edited 02-13-2018 03:15 PM
I am very green to SAS VA, but have what I hope is a simple question, If I want to auto load data via a SAS Dataset, does it help, hurt or make a bit of difference to index the data set in the first place before even placing it in my Auto load folder? TIA PS [if] it matters at this time we have a dataset that is 4.5 Million obs with 184 fields and growing by 1.1+ mil / year average. I have done many of the indexes on the tables fields to see if they even meet the recommendations of placing an index on a field. So do some don't, but have not yet looked critically if it even makes sense per field.
02-14-2018 09:48 AM
According to the user's guide, you cannot import indexed SAS data sets. I assume that is a general rule, not just for imports. You can compress data, but that conserves memory at the expense of performance.
02-14-2018 10:23 AM
I do a proc copy and get this message, table is loaded
100 proc copy in=tolib out=zzz
101 select zzzz;
NOTE: Copying zzzz to zzzz (memtype=DATA).
WARNING: Indexes for zzzz.DATA cannot be created.
WARNING: Engine SASIOLA does not support index create operations.
NOTE: There were 192005 observations read from the data set aaaa.
NOTE: The data set aaaa has 192005 observations and 27 variables.
NOTE: PROCEDURE COPY used (Total process time):
02-14-2018 11:33 AM
Thank you, I had the admin guide. I did not see it in there, the user guide sure does.
Review the following notes if you have trouble importing data: n Before you click OK to import the data, click Preview. Preview shows an accurate representation of the column names and data values that will be available after the import. n If SAS is configured as a Unicode server at your site, then you have the most flexibility for importing data. Specifically, SAS as a Unicode server helps with using column names or filenames (that are used as table names) that have double-byte characters. n When you import a delimited text file (CSV file), you must specify the encoding of the text file. In some cases, the import reports success, even though the data might be corrupted. It is important to verify the imported data. n If you import a SAS data set that uses user-defined formats, then you must ensure that the custom format catalog is available to the SAS Application Server. For more information, see “Working with User-Defined Formats” on page 39. n If importing large data files at the same time is common for your deployment, then you should be aware that large data files are written to temporary disk space on the server. In extreme cases, this can cause temporary disk space to become full. Systems that run out of disk space can become unresponsive and difficult to troubleshoot. n If you import data from text files and plan to append the data, then you must verify that the column data types and lengths match the table that you want to append to. n When you import data, a SAS LASR Analytic Server does not maintain preexisting sort orders. You must re-sort the data after you import it. n Importing indexed SAS data sets is not supported."
02-14-2018 11:56 AM
Now, I understand you want to improve performance on loading your datasets.
First, yes, extend knowledge through documentation will help, definetely:
Finally, if you have a distributed LASR/VA, there are many ways to improve the performance of the data load: managing the blocksize on your hadoop cluster, the co-location configuration (and disks), with some extra layers such as Hive, and more
02-14-2018 12:08 PM
02-15-2018 08:38 AM
so your data is coming from a Database, I see. If you look for performance, you better get the data you specifically need from the DB into filtered SAS tables, and then you can load or autoload them with VA. Probably you would like to have a SAS DI or EG code to make the filtered transformation. Preferably, not from the VA server but another SAS server, data management related.
The main reason is because VA will always get the full tables from your DB, create the full table from the DBs into SAS tables in Work, and then query them for the VA reports. Hence, the only way to avoid this overkill in disk, network and time, is to create the required SAS tables or datamart yourself.
I hope it helps,
02-23-2018 02:24 PM
02-15-2018 07:07 PM - edited 02-15-2018 07:08 PM
Regarding VA performance. In my experience I have found:
Need further help from the community? Please ask a new question.