SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Importing large data set issue ?

Accepted Solution Solved
Reply
Contributor
Posts: 46
Accepted Solution

Importing large data set issue ?

I have a table in sql in which i have i have 30 million rows and 100 columns. So the problem is the required size to store the entire table in sas server is ~1TB which is quite large.

Can anybody explain why it is taking such huge space to store the data. This data is not compressed.

 

Processes that i have used to bring data is by writing program in EG. and setting the store location in server.

 


Accepted Solutions
Solution
‎01-17-2017 08:38 AM
Contributor
Posts: 32

Re: Importing large data set issue ?

It sounds as if you have some Character variables that is set to a very long length. Even one or two of this that gets to 32000 and no compression can give a humongous data set.

View solution in original post


All Replies
Valued Guide
Posts: 505

Re: Importing large data set issue ?

30 million rows is quite small, to get to 999gb the width of each row would be 33,000 bytes. I think you should normalize or look at a set of tables. Note that unless you need all 33,000 bytes separate tables would be much faster than processing 33,000 bytes records.

 

Even the census full sf1 detail is not this wide.

Super User
Posts: 6,932

Re: Importing large data set issue ?

1000 GB / 30M rows = ~33000 bytes per row

33000 / 100 = 330 average variable length, so you have mostly quite long character variables.

Absolutely needs to be stored with the compress=yes option.

Inspect the structure in the SQL database, and inspect the data to look if the defined lengths are needed.

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Regular Contributor
Posts: 213

Re: Importing large data set issue ?

Hi Ankit,

 

Have a look at this macro http://support.sas.com/kb/24/804.html

It should help you with optimizing the storage requirements for your table.

 

Good luck,

Ahmed

Respected Advisor
Posts: 3,887

Re: Importing large data set issue ?

[ Edited ]

Additionally to what others already wrote: If you look directly at the database definitions of a character variable and then look at the definition of the corresponding SAS variable, do you see any multiplication of lengths?

 

Example: Do you see something like SQL_Var with Varchar(100) and then SAS_Var with a length of $400?

 

You can investigate this for any character variable as if this happens, it's going to apply for all of them.

Super Contributor
Posts: 251

Re: Importing large data set issue ?

As well as trying compress=yes, try compress=binary - this can work well with very long rows (lots of columns).

 

Additionally, if you've got date or time fields (not datetime), they can always be stored in four bytes instead of the default of eight. If you're using the wizard to import the data, you may not have the control to change these lengths, and you might have to write a specific data step (easier to control than SQL, quite possibly) to modify the metadata.

 

If your import can't change metadata but can create a view, then you can do what you like in the following step (dropping variables, turning compression on, modifying variable lengths and so on).

 

(Sorry, I don't have EG, nor do I know what DBMS your source data is in, so I can't help further - yet!)

Contributor
Posts: 46

Re: Importing large data set issue ?

All good answers to the problem. But it boggles my mind, that entier data on on my sql server only takes up 700 GB of hard didk space, and then there is this one table from the same group is bloating upto 1TB in SAS.

I will definately investigate the variables in my dataset. 

 

But if i would like to boil down the question further as to why it needs to be so humongous ? 

If it is for performance, i don't see it working, as operations on datasets containing 2 million rows are taking ~2-3 mins. which would be quite less in SQL.

 

 

Super User
Posts: 6,932

Re: Importing large data set issue ?

Varchar is used in databases to conserve unused space when strings don't occupy the whole defined length. Since SAS does not have variable-length strings, other mechanism need to be used (compressing the dataset). Also keep in mind that date and time values that only need 4 bytes in a DBMS are stored with 8 bytes per default in SAS, so you should use a lengt statement to reduce them in size.

Is the 1 TB that you get the compressed size?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Solution
‎01-17-2017 08:38 AM
Contributor
Posts: 32

Re: Importing large data set issue ?

It sounds as if you have some Character variables that is set to a very long length. Even one or two of this that gets to 32000 and no compression can give a humongous data set.
Super User
Posts: 5,256

Re: Importing large data set issue ?

So, there's an architectural difference between SAS and SQL Server, where I can agree upon that the VAR CHAR is a bit more flexible and has a less cost than the SAS counterpart COMPRESS. You have already been been given a few gid how to's to manage data storage efficiently in SAS.

But what is your actual problem? A storage difference around 30% isn't much to be concerned about, IMO definitely not homongous. If you have an adequate performance comparison feel free to share. Until this day I have never seen an SQL Server query outrun SAS on similar/same HW. If performance is an issue consider store in a SAS SPDE library.
Data never sleeps
Contributor
Posts: 46

Re: Importing large data set issue ?

Thanks LinusH, will look into the SAS SPDE library. Please if you have any other suggestion on how to judge your SAS performance let me know.

Thanks for your reply.
Super User
Posts: 5,256

Re: Importing large data set issue ?

Judge performance:
Chose a couple of typical/critical queries and run them with the same set of data and indexing in both environments preferably with comparable hw.
Data never sleeps
Super User
Posts: 5,256

Re: Importing large data set issue ?

And, don't forget loading if that is a regular process.
Data never sleeps
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 2660 views
  • 11 likes
  • 8 in conversation