Hi Folks, I have a work space issue
I ran SAS overnight to merge Z.ALL_SITES (76.4GB) and Z.DATA_LARGE (560MB). SAS crashed "out of memory" this morning. And I cleaned H and Z folders except the datasets needed. Folder ''H" has 1.96TB for a free space. Folder Z has 490GB for a free space.
I'll appreciate any suggestions.
Thanks zillions in advance. This is being a huge roadblock. I can't move on my project because of this memory issue.
DATA H.MERGED;
MERGE
Z.ALL_SITES (IN=A)
Z.DATA_LARGE (IN=B);
BY ID;
IF A AND B;
RUN;
Here you go:
data h.merged;
if _n_=1 then
do;
if 0 then set z.data_large(keep=(id var1 var2));
dcl h1 (dataset:'z.data_large(keep=(id var1 var2))';
h1.defineKey('id');
h1.defineData('var1','var2');
h1.defineDone();
end;
set z.all_sites;
do while(h1.do_over() eq 0);
output;
end;
run;
Maxim 3: Know Your Data. Inspect your input datasets for large columns, and if those datasets are stored with compress=yes.
If you are unsure, post the output of proc contents for both datasets here.
Thanks a lot. Below is the proc contents of both datasets. I only need ID, VAR1 and VAR2 from the DATA_LARGE and tempted to keep the variables I only needed. However, since I can not remove original DATA_LARGE from the folder then this would just reduce the work space in the folder R.
Correct?
What do you think?
DATA R.DATA_LARGEA; SET R.DATA_LARGE(KEEP=ID VAR1 VAR2);
WHERE VAR1 NE .;
RUN;
You can do all of the work in one step:
data h.merged (compress=yes);
merge
z.all_sites (in=a)
z.data_large (
in=b
keep=id var1 var2
where=(var1 ne .)
)
;
by id;
if a and b;
run;
Please post the log from that step, if it fails. Use the {i} button for posting logs.
PS if you enable listing output in Enterprise Guide, you can copy/paste the text to a {i} window also, so you don't need to make screenshots.
It's my favorite method of posting results.
@Cruise wrote:
Hi Folks, I have a work space issue
I ran SAS overnight to merge Z.ALL_SITES (76.4GB) and Z.DATA_LARGE (560MB). SAS crashed "out of memory" this morning. And I cleaned H and Z folders except the datasets needed. Folder ''H" has 1.96TB for a free space. Folder Z has 490GB for a free space.
A minor point here: running out of memory is not the same thing as not having enough free disk space. I think @Kurt_Bremser is on the right track to solving the problem.
The "out of memory" issue is kind of weird given the code you've posted.
The merge step would only ever load a single row at a time into memory (the PDV) and though not require much memory.
The only reason I can think of that the code you've posted could throw an "out of memory" error is if the "tables" in the Merge statement are actually Views - and it's these views which consume a lot of memory during view execution time (which would happen as part of the merge statement).
...so: If my theory applies then the first step is to change the views to tables (with option compress=yes); and eventually try to implement code for the views (now tables) which is more memory efficient.
Kurt, I'm running the code you suggested. Last time it crashed after few hours when I ran it overnight.But this time I don't know how many hours this will run. I will post the log as soon as it's finished.
I'm concerned about Astounding's pointer that I might need more space in H drive. H drive is empty for a purpose to output MERGED dataset. I can't change the size of this folder easily because it's not my private computer.
It appears @Ksharp actually looked at your Proc Contents report and calculated the required memory for the variables you're after. If that's only around 1/2 GB then you've got likely sufficient memory and the data step hash lookup could work.
Below code based on the code you've posted but of course not tested as you haven't provided sample data. Give it a go an let us know if it works.
data h.merged;
if _n_=1 then
do;
if 0 then set z.data_large(keep=(id var1 var2));
dcl h1 (dataset:'z.data_large(keep=(id var1 var2))';
h1.defineKey('id');
h1.defineData('var1','var2');
h1.defineDone();
end;
set z.all_sites;
if h1.find()=0 then output;
run;
Thanks for the "like". Please use the latest (current) cut of my code for testing as I had to fix it since the initial post.
And just for clarity: The code I've posted assumes that the relationship between z.all_sites to z.data_large is N:1 or 1:1.
If the relationship is 1:N and you're after left join logic then the code I've posted would require some additional looping (using the hash do_over method).
Hi Patrick,
This is 1:N relationship. 1 side has unique ID with no duplicates. N side is all the claim that patient had ever made in her/his entire medical history. So 1:N can actually be 1:1000 at least all the way to 1:5000 because claim data captures every drug, diagnosis or procedures of multiple chronic diseases.
What can be done in this context? like you said looping? can you extend the code for looping? if possible?
O.K., I'll post some code. Without sample data to test I can't promise you though that it will be syntactically correct. It's getting late where I live so others might need to take over after that.
I'll provide you a sample data asap.
Here you go:
data h.merged;
if _n_=1 then
do;
if 0 then set z.data_large(keep=(id var1 var2));
dcl h1 (dataset:'z.data_large(keep=(id var1 var2))';
h1.defineKey('id');
h1.defineData('var1','var2');
h1.defineDone();
end;
set z.all_sites;
do while(h1.do_over() eq 0);
output;
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.