BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ankit___gupta
Quartz | Level 8

I have a table in sql in which i have i have 30 million rows and 100 columns. So the problem is the required size to store the entire table in sas server is ~1TB which is quite large.

Can anybody explain why it is taking such huge space to store the data. This data is not compressed.

 

Processes that i have used to bring data is by writing program in EG. and setting the store location in server.

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaalNavestad
Pyrite | Level 9
It sounds as if you have some Character variables that is set to a very long length. Even one or two of this that gets to 32000 and no compression can give a humongous data set.

View solution in original post

12 REPLIES 12
rogerjdeangelis
Barite | Level 11

30 million rows is quite small, to get to 999gb the width of each row would be 33,000 bytes. I think you should normalize or look at a set of tables. Note that unless you need all 33,000 bytes separate tables would be much faster than processing 33,000 bytes records.

 

Even the census full sf1 detail is not this wide.

Kurt_Bremser
Super User

1000 GB / 30M rows = ~33000 bytes per row

33000 / 100 = 330 average variable length, so you have mostly quite long character variables.

Absolutely needs to be stored with the compress=yes option.

Inspect the structure in the SQL database, and inspect the data to look if the defined lengths are needed.

 

AhmedAl_Attar
Rhodochrosite | Level 12

Hi Ankit,

 

Have a look at this macro http://support.sas.com/kb/24/804.html

It should help you with optimizing the storage requirements for your table.

 

Good luck,

Ahmed

Patrick
Opal | Level 21

Additionally to what others already wrote: If you look directly at the database definitions of a character variable and then look at the definition of the corresponding SAS variable, do you see any multiplication of lengths?

 

Example: Do you see something like SQL_Var with Varchar(100) and then SAS_Var with a length of $400?

 

You can investigate this for any character variable as if this happens, it's going to apply for all of them.

LaurieF
Barite | Level 11

As well as trying compress=yes, try compress=binary - this can work well with very long rows (lots of columns).

 

Additionally, if you've got date or time fields (not datetime), they can always be stored in four bytes instead of the default of eight. If you're using the wizard to import the data, you may not have the control to change these lengths, and you might have to write a specific data step (easier to control than SQL, quite possibly) to modify the metadata.

 

If your import can't change metadata but can create a view, then you can do what you like in the following step (dropping variables, turning compression on, modifying variable lengths and so on).

 

(Sorry, I don't have EG, nor do I know what DBMS your source data is in, so I can't help further - yet!)

ankit___gupta
Quartz | Level 8

All good answers to the problem. But it boggles my mind, that entier data on on my sql server only takes up 700 GB of hard didk space, and then there is this one table from the same group is bloating upto 1TB in SAS.

I will definately investigate the variables in my dataset. 

 

But if i would like to boil down the question further as to why it needs to be so humongous ? 

If it is for performance, i don't see it working, as operations on datasets containing 2 million rows are taking ~2-3 mins. which would be quite less in SQL.

 

 

Kurt_Bremser
Super User

Varchar is used in databases to conserve unused space when strings don't occupy the whole defined length. Since SAS does not have variable-length strings, other mechanism need to be used (compressing the dataset). Also keep in mind that date and time values that only need 4 bytes in a DBMS are stored with 8 bytes per default in SAS, so you should use a lengt statement to reduce them in size.

Is the 1 TB that you get the compressed size?

PaalNavestad
Pyrite | Level 9
It sounds as if you have some Character variables that is set to a very long length. Even one or two of this that gets to 32000 and no compression can give a humongous data set.
LinusH
Tourmaline | Level 20
So, there's an architectural difference between SAS and SQL Server, where I can agree upon that the VAR CHAR is a bit more flexible and has a less cost than the SAS counterpart COMPRESS. You have already been been given a few gid how to's to manage data storage efficiently in SAS.

But what is your actual problem? A storage difference around 30% isn't much to be concerned about, IMO definitely not homongous. If you have an adequate performance comparison feel free to share. Until this day I have never seen an SQL Server query outrun SAS on similar/same HW. If performance is an issue consider store in a SAS SPDE library.
Data never sleeps
ankit___gupta
Quartz | Level 8
Thanks LinusH, will look into the SAS SPDE library. Please if you have any other suggestion on how to judge your SAS performance let me know.

Thanks for your reply.
LinusH
Tourmaline | Level 20
Judge performance:
Chose a couple of typical/critical queries and run them with the same set of data and indexing in both environments preferably with comparable hw.
Data never sleeps
LinusH
Tourmaline | Level 20
And, don't forget loading if that is a regular process.
Data never sleeps

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 7174 views
  • 11 likes
  • 8 in conversation