BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi Folks, I have a work space issue

 

I ran SAS overnight to merge Z.ALL_SITES (76.4GB) and Z.DATA_LARGE (560MB). SAS crashed "out of memory" this morning. And I cleaned H and Z folders except the datasets needed. Folder ''H" has 1.96TB for a free space. Folder Z has 490GB for a free space. 

 

I'll appreciate any suggestions.

Thanks zillions in advance. This is being a huge roadblock. I can't move on my project because of this memory issue.

 

DATA H.MERGED;
MERGE
     Z.ALL_SITES  (IN=A)
     Z.DATA_LARGE (IN=B);
BY ID;
IF A AND B;
RUN; 
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@Cruise 

Here you go:

data h.merged;
  if _n_=1 then 
    do;
      if 0 then set z.data_large(keep=(id var1 var2));
      dcl h1 (dataset:'z.data_large(keep=(id var1 var2))';
      h1.defineKey('id');
      h1.defineData('var1','var2');
      h1.defineDone();
    end;
  set z.all_sites;
  do while(h1.do_over() eq 0);
    output;
  end;
run;

http://support.sas.com/documentation/cdl/en/lecompobjref/69740/HTML/default/viewer.htm#p07odyzmifa4y... 

View solution in original post

60 REPLIES 60
Kurt_Bremser
Super User

Maxim 3: Know Your Data. Inspect your input datasets for large columns, and if those datasets are stored with compress=yes.

If you are unsure, post the output of proc contents for both datasets here.

Cruise
Ammonite | Level 13

@Kurt_Bremser @PaigeMiller 

 

Thanks a lot. Below is the proc contents of both datasets. I only need ID, VAR1 and VAR2 from the DATA_LARGE and tempted to keep the variables I only needed. However, since I can not remove original DATA_LARGE from the folder then this would just reduce the work space in the folder R.

Correct?

What do you think?

 

DATA R.DATA_LARGEA; SET R.DATA_LARGE(KEEP=ID VAR1 VAR2);
WHERE VAR1 NE .; 
RUN; 

 

LARGE DATA.png

Kurt_Bremser
Super User

You can do all of the work in one step:

data h.merged (compress=yes);
merge
  z.all_sites (in=a)
  z.data_large (
    in=b
    keep=id var1 var2
    where=(var1 ne .)
  )
;
by id;
if a and b;
run;

Please post the log from that step, if it fails. Use the {i} button for posting logs.

 

PS if you enable listing output in Enterprise Guide, you can copy/paste the text to a {i} window also, so you don't need to make screenshots.

It's my favorite method of posting results.

PaigeMiller
Diamond | Level 26

@Cruise wrote:

Hi Folks, I have a work space issue

 

I ran SAS overnight to merge Z.ALL_SITES (76.4GB) and Z.DATA_LARGE (560MB). SAS crashed "out of memory" this morning. And I cleaned H and Z folders except the datasets needed. Folder ''H" has 1.96TB for a free space. Folder Z has 490GB for a free space. 

A minor point here: running out of memory is not the same thing as not having enough free disk space. I think @Kurt_Bremser is on the right track to solving the problem.

--
Paige Miller
Patrick
Opal | Level 21

@Cruise 

The "out of memory" issue is kind of weird given the code you've posted. 

The merge step would only ever load a single row at a time into memory (the PDV) and though not require much memory.

 

The only reason I can think of that the code you've posted could throw an "out of memory" error is if the "tables" in the Merge statement are actually Views - and it's these views which consume a lot of memory during view execution time (which would happen as part of the merge statement).

 

...so: If my theory applies then the first step is to change the views to tables (with option compress=yes); and eventually try to implement code for the views (now tables) which is more memory efficient.

Astounding
PROC Star
Regarding data set compression, the issue is not whether the incoming data sets are compressed or not. They are on the Z drive, and you need more space on the H drive.

If the data contains a lot of character variables, add this statement before the merge:

options compress=yes;

If it contains more numeric variables, add this instead:

options compress=binary;
Cruise
Ammonite | Level 13

@Kurt_Bremser @Astounding 

 

Kurt, I'm running the code you suggested. Last time it crashed after few hours when I ran it overnight.But this time I don't know how many hours this will run. I will post the log as soon as it's finished. 

 

I'm concerned about Astounding's pointer that I might need more space in H drive. H drive is empty for a purpose to output MERGED dataset. I can't change the size of this folder easily because it's not my private computer.

Patrick
Opal | Level 21

@Cruise 

It appears @Ksharp actually looked at your Proc Contents report and calculated the required memory for the variables you're after. If that's only around 1/2 GB then you've got likely sufficient memory and the data step hash lookup could work.

Below code based on the code you've posted but of course not tested as you haven't provided sample data. Give it a go an let us know if it works.

data h.merged;
  if _n_=1 then 
    do;
      if 0 then set z.data_large(keep=(id var1 var2));
      dcl h1 (dataset:'z.data_large(keep=(id var1 var2))';
      h1.defineKey('id');
      h1.defineData('var1','var2');
      h1.defineDone();
    end;
  set z.all_sites;
  if h1.find()=0 then output;
run;

 

Patrick
Opal | Level 21

@Cruise 

Thanks for the "like". Please use the latest (current) cut of my code for testing as I had to fix it since the initial post.

Patrick
Opal | Level 21

@Cruise 

And just for clarity: The code I've posted assumes that the relationship between z.all_sites to z.data_large is N:1 or 1:1.

If the relationship is 1:N and you're after left join logic then the code I've posted would require some additional looping (using the hash do_over method).

Cruise
Ammonite | Level 13

@Patrick 

Hi Patrick,

This is 1:N relationship. 1 side has unique ID with no duplicates. N side is all the claim that patient had ever made in her/his entire medical history. So 1:N can actually be 1:1000 at least all the way to 1:5000 because claim data captures every drug, diagnosis or procedures of multiple chronic diseases.

What can be done in this context? like you said looping? can you extend the code for looping? if possible?

Patrick
Opal | Level 21

@Cruise 

O.K., I'll post some code. Without sample data to test I can't promise you though that it will be syntactically correct. It's getting late where I live so others might need to take over after that.

Cruise
Ammonite | Level 13

@Patrick 

I'll provide you a sample data asap.

Patrick
Opal | Level 21

@Cruise 

Here you go:

data h.merged;
  if _n_=1 then 
    do;
      if 0 then set z.data_large(keep=(id var1 var2));
      dcl h1 (dataset:'z.data_large(keep=(id var1 var2))';
      h1.defineKey('id');
      h1.defineData('var1','var2');
      h1.defineDone();
    end;
  set z.all_sites;
  do while(h1.do_over() eq 0);
    output;
  end;
run;

http://support.sas.com/documentation/cdl/en/lecompobjref/69740/HTML/default/viewer.htm#p07odyzmifa4y... 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 60 replies
  • 3260 views
  • 35 likes
  • 7 in conversation