01-13-2017 08:34 AM
I have a table in sql in which i have i have 30 million rows and 100 columns. So the problem is the required size to store the entire table in sas server is ~1TB which is quite large.
Can anybody explain why it is taking such huge space to store the data. This data is not compressed.
Processes that i have used to bring data is by writing program in EG. and setting the store location in server.
01-17-2017 07:56 AM
01-13-2017 08:48 AM
30 million rows is quite small, to get to 999gb the width of each row would be 33,000 bytes. I think you should normalize or look at a set of tables. Note that unless you need all 33,000 bytes separate tables would be much faster than processing 33,000 bytes records.
Even the census full sf1 detail is not this wide.
01-13-2017 09:00 AM
1000 GB / 30M rows = ~33000 bytes per row
33000 / 100 = 330 average variable length, so you have mostly quite long character variables.
Absolutely needs to be stored with the compress=yes option.
Inspect the structure in the SQL database, and inspect the data to look if the defined lengths are needed.
01-13-2017 08:29 PM - edited 01-13-2017 08:30 PM
Additionally to what others already wrote: If you look directly at the database definitions of a character variable and then look at the definition of the corresponding SAS variable, do you see any multiplication of lengths?
Example: Do you see something like SQL_Var with Varchar(100) and then SAS_Var with a length of $400?
You can investigate this for any character variable as if this happens, it's going to apply for all of them.
01-13-2017 09:59 PM
As well as trying compress=yes, try compress=binary - this can work well with very long rows (lots of columns).
Additionally, if you've got date or time fields (not datetime), they can always be stored in four bytes instead of the default of eight. If you're using the wizard to import the data, you may not have the control to change these lengths, and you might have to write a specific data step (easier to control than SQL, quite possibly) to modify the metadata.
If your import can't change metadata but can create a view, then you can do what you like in the following step (dropping variables, turning compression on, modifying variable lengths and so on).
(Sorry, I don't have EG, nor do I know what DBMS your source data is in, so I can't help further - yet!)
01-14-2017 05:11 AM
All good answers to the problem. But it boggles my mind, that entier data on on my sql server only takes up 700 GB of hard didk space, and then there is this one table from the same group is bloating upto 1TB in SAS.
I will definately investigate the variables in my dataset.
But if i would like to boil down the question further as to why it needs to be so humongous ?
If it is for performance, i don't see it working, as operations on datasets containing 2 million rows are taking ~2-3 mins. which would be quite less in SQL.
01-14-2017 09:52 AM
Varchar is used in databases to conserve unused space when strings don't occupy the whole defined length. Since SAS does not have variable-length strings, other mechanism need to be used (compressing the dataset). Also keep in mind that date and time values that only need 4 bytes in a DBMS are stored with 8 bytes per default in SAS, so you should use a lengt statement to reduce them in size.
Is the 1 TB that you get the compressed size?
01-17-2017 07:56 AM
01-14-2017 07:51 AM
01-16-2017 02:08 AM
Need further help from the community? Please ask a new question.