Hi,
To give an overall description: I have a SQL Server table with 22535417 rows that take up 4,9GB, on SQL Server, that I would like to import into SAS VA.
The import procedure times out on me so I tried to create a SAS dataset on the same server first but the SAS Dataset is 71GB big, compared to only 4,9GB on the server.
Why does the size blow up that much and how can I prevent it?
Thank you for help!
If you have long character fields in the table, the compress=yes dataset option will be helpful.
SAS does not have an equivalent to the varchar datatype present in RDBMS systems, so all character variables are fixed length (and therefore often padded with lots of blanks, which the compress option takes care of).
The table has 28 columns: 4 varchar, 6 timestamp columns and the rest is bigint. I was told that you shouldn't compress tables when you plan on using them in SAS VA, is that advice outdated?
If the 4 varchar are defined with a large size (think 4K or similar), and mostly empty, you should do something about that. Use compress=yes in your Base SAS dataset, and decide if you need all that length in VA before uploading.
IMO, large texts are not something you would use in VA, where numbers, dates, categories are mostly the base for analysis.
Its quite hard to tell as we can't see any of the process. Is it really "all in one table" on the server, or does it follow RDBMS and split things out? This could be one main factor for size growth, if you combining all the data into one.
Sql server tells me the table is 5006360 KB big:
2 3 4 | USE {database_name}; GO EXEC sp_spaceused N'{dbo}.{table_name}'; GO |
It seems strange that it would grow that much.
That only tells you the amount of disk space as defined in the database. Databases have numerous tricks to shrink data, or speed up access.
Have you tried exporting the data to plain text file format, just to see how big the plain text file would be, I suspect that will be quite large as well. Maybe worth checking and seeing if the text file looks the same as the dataset.
I just did an export to csv and the resulting csv file is 5.5gb big.
@dakes wrote:
I just did an export to csv and the resulting csv file is 5.5gb big.
That's because the export to csv removes all the trailing blanks from the varchar's (which is basically what the compress=yes option does with the dataset).
Run a proc contents on the 71 GB SAS dataset, and post the output here.
I removed 2 of the varchar columns and used the compress=yes option. The resulting dataset is 3,9gb big, that's ok for me, I will now try and see what the performance in sas va is like. Thank you very much!
If you need to shrink it more, here's an EG task (and link to a program) that can help. It analyzes your char data to learn what lengths you really need.
And another trick. If you have character columns that are long, but categorical (not so many distinct values), you can replace those with a lookup value and a SAS format. That trick works in Visual Analytics too and can reduce the memory footprint of your tables.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.