BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kjohnsonm
Lapis Lazuli | Level 10

Hello all,

I am very green to SAS VA, but have what I hope is a simple question, If I want to auto load data via a SAS Dataset, does it help, hurt or make a bit of difference to index the data set in the first place before even placing it in my Auto load folder?  TIA   PS [if] it matters at this time we have a dataset that is 4.5 Million obs with 184 fields and growing by 1.1+ mil / year average. I have done many of the indexes on the tables fields to see if they even meet the recommendations of placing an index on a field. So do some don't, but have not yet looked critically if it even makes sense per field.

 

1 ACCEPTED SOLUTION

Accepted Solutions
FredrikE
Rhodochrosite | Level 12

I'd say it doesn't help!

When loading a table with indexes you will get the message that indexes are not handled 🙂

//Fredrik

View solution in original post

10 REPLIES 10
FredrikE
Rhodochrosite | Level 12

I'd say it doesn't help!

When loading a table with indexes you will get the message that indexes are not handled 🙂

//Fredrik

Madelyn_SAS
SAS Super FREQ

According to the user's guide, you cannot import indexed SAS data sets. I assume that is a general rule, not just for imports. You can compress data, but that conserves memory at the expense of performance.

 

http://support.sas.com/documentation/cdl/en/vaag/69958/HTML/default/viewer.htm#p0ehlxuj60aw6qn161pr8...

 

 

FredrikE
Rhodochrosite | Level 12

I do a proc copy and get this message, table is loaded 🙂

 

100 proc copy in=tolib out=zzz
101 select zzzz;
103 quit;

NOTE: Copying zzzz to zzzz (memtype=DATA).
WARNING: Indexes for zzzz.DATA cannot be created.
WARNING: Engine SASIOLA does not support index create operations.
NOTE: There were 192005 observations read from the data set aaaa.
NOTE: The data set aaaa has 192005 observations and 27 variables.
NOTE: PROCEDURE COPY used (Total process time):

 

//Fredrik

kjohnsonm
Lapis Lazuli | Level 10

Thank you, I had the admin guide. I did not see it in there, the user guide sure does.

 

Page23. of http://support.sas.com/documentation/cdl/en/vaug/69957/PDF/default/vaug.pdf

 

"Usage Notes
Review the following notes if you have trouble importing data: n Before you click OK to import the data, click Preview. Preview shows an accurate representation of the column names and data values that will be available after the import. n If SAS is configured as a Unicode server at your site, then you have the most flexibility for importing data. Specifically, SAS as a Unicode server helps with using column names or filenames (that are used as table names) that have double-byte characters. n When you import a delimited text file (CSV file), you must specify the encoding of the text file. In some cases, the import reports success, even though the data might be corrupted. It is important to verify the imported data. n If you import a SAS data set that uses user-defined formats, then you must ensure that the custom format catalog is available to the SAS Application Server. For more information, see “Working with User-Defined Formats” on page 39. n If importing large data files at the same time is common for your deployment, then you should be aware that large data files are written to temporary disk space on the server. In extreme cases, this can cause temporary disk space to become full. Systems that run out of disk space can become unresponsive and difficult to troubleshoot. n If you import data from text files and plan to append the data, then you must verify that the column data types and lengths match the table that you want to append to. n When you import data, a SAS LASR Analytic Server does not maintain preexisting sort orders. You must re-sort the data after you import it. n Importing indexed SAS data sets is not supported."

kjohnsonm
Lapis Lazuli | Level 10
I guess this means I need to hunt down a more complete list of SAS provided VA documents 8] ...another saga.
JuanS_OCS
Amethyst | Level 16

Hello @kjohnsonm,

 

regarding indexes, I have nothing left to say, @Madelyn_SAS@FredrikE and yourself found the answers.

 

Now, I understand you want to improve performance on loading your datasets.

 

First, yes, extend knowledge through documentation will help, definetely:

Finally, if you have a distributed LASR/VA, there are many ways to improve the performance of the data load: managing the blocksize on your hadoop cluster, the co-location configuration (and disks), with some extra layers such as Hive, and more

 

http://support.sas.com/resources/papers/proceedings16/11206-2016.pdf

https://support.sas.com/content/dam/SAS/support/en/documentation/third-party-reference/sas-hadoop-pe...

https://communities.sas.com/t5/SAS-Communities-Library/SAS-with-Hadoop-performances-Best-practices-a...

kjohnsonm
Lapis Lazuli | Level 10
Hello Juan, Thanks for the post. No we are not "distributed LASR/VA"
"Have you tried AutoLoad?" yes we have. We saw links in the communities about user’s auto loading DB's directly. We were hoping to follow in those footsteps. We understand now that is also not supported so we are switching to the SAS data set rout. Our DB is changed only a few times a year, It’s not a real-time ERP DB. changing randomly any hour of the day via web input but a snapshot based data warehouse that is updated on our Schools census day, midterm, end of term, X3 for each term from data in the ERP. Thus since the DB is so stable we were hoping to avoid having to have a middle step and connect directly. We are way to small to need Hadoop. and likely cannot afford it.
JuanS_OCS
Amethyst | Level 16

Hello @kjohnsonm,

 

so your data is coming from a Database, I see. If you look for performance, you better get the data you specifically need from the DB into filtered SAS tables, and then you can load or autoload them with VA. Probably you would like to have a SAS DI or EG code to make the filtered transformation. Preferably, not from the VA server but another SAS server, data management related.

 

The main reason is because VA will always get the full tables from your DB, create the full table from the DBs into SAS tables in Work, and then query them for the VA reports. Hence, the only way to avoid this overkill in disk, network and time, is to create the required SAS tables or datamart yourself.

 

I hope it helps,


Kind regards,

Juan

kjohnsonm
Lapis Lazuli | Level 10
"Probably you would like to have a SAS DI or EG code to make the filtered transformation. Preferably, not from the VA server but another SAS server, data management related." -- thanks for the pointer, I will read up on this.
SASKiwi
PROC Star

Regarding VA performance. In my experience I have found:

 

  • Dataset compression tends to slow performance and in general I haven't found it useful
  • If your tables contain long standard text descriptions with low cardinality, then converting these into codes linked to user-defined formats can really save space and significantly improve performance
  • If your VA reporting requires only summary-level data, then you can gain big performance improvements by using multiple summary tables customised for your reporting rather than one big detail-level table

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Tips for filtering data sources in SAS Visual Analytics

See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2559 views
  • 9 likes
  • 5 in conversation