BookmarkSubscribeRSS Feed
JJ_83
Obsidian | Level 7

I have a dataset that is 948,036 obs by 221 var. I need each obs to have a day value from 1/1/2008 to 12/31/2017. The output dataset would be 3,462,227,000 obs by 221 var. 

 

data want (compress=yes);
	set have;
    	do day='01JAN2008'd to '31DEC2017'd;
			output;
	end;
	format day mmddyy10.;
run;

When I run this code I get the error below.

 

sas_out_of_resources.PNG

 

I have 16 gb of ram and about 1tb of storage. Is the limitation my device or SAS, and are there any ways to work with datasets this large in SAS?

6 REPLIES 6
ballardw
Super User

What SAS environment are you using?

If you are connecting to a SAS server your SAS admin is likely to have a limit on how much space you are allowed to use.

If you are using SAS On demand I believe there are limits on work space.

 

Why are you duplicating every value in your existing data 36500 times except for that date? Sounds like a not well thought out problem.

Kurt_Bremser
Super User

Take the sizes of the variables (numeric needs 8 byte per variable), add them up, and multiply by the expected number of observations, so you have a rough idea of the disk space needed.

A quick calculation, assuming all 221 variables to be numbers, is this:

221 * 8 * 948036 * 3650

and results in a size of roughly 5.7 TB. Add a little overhead, and you'll need at least 6 TB to store that. But to WORK with it, you'll need at least three times the size, and a really hefty computer to process this much in tolerable time.

 

What good is this mass of redundancy, anyway?

yabwon
Onyx | Level 15

1) "I need each obs to have a day value from 1/1/2008 to 12/31/2017" - why? is it necessary?

2) even if each variable would be no bigger than 2 bytes, still 1tb may not be enough...

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



mkeintz
PROC Star

I agree with @yabwon.   

 

Your problem may very well be more about how you want to organize data to accomplish your task, than about disk space.

 

The revealing statement that you want 3,652 days for each incoming obs, multiplying the size of your data set by 3,652, does not have an obvious justification to me.   After all, you are repeating the other 220 variables as constants - that's a lot of needless duplication.

 

What is the intended use of this gigantic data set?

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Reeza
Super User
May be a case where you need to loop and deal with each observation individually. More details are needed. Usually the recommendation is to not be loopy but if you are under resourced for your problem, not much choice.
ballardw
Super User

Here's an idea:

Create a very small data set, or use one of the SAS supplied data sets like SASHELP.CLASS , CARS or STOCKS and add maybe 30 dates to the data.

Then show us what you intended to do with your original data using this much reduced data set. We might have some other suggestions.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 580 views
  • 3 likes
  • 6 in conversation