BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
lichee
Quartz | Level 8

Hi all,

 

I used the hash object as guided in my previous query to create over 1000 clinical classifications software diagnosis and procedure groups for the inpatient stay data. There are over 5 million observations (stays) in the stay file (stayfile) and over 30 million observations (claims) in the claim file (clmfile).

 

I tried "set .... POINT= " statement as guided here, but still ran into insufficient memory. Below is my code.

data stayfile_id;
set stayfile;
stay_id=cat(Person_ID,"_",put(stay_from_dt,date9.));
keep stay_id;
run;
data rid /view=rid;
set stayfile_id (keep=stay_id );
rid=_n_;
run;

data want(drop = clm_beg_dt clm_end_dt &ccsr_dx_proc_var.);

if _N_ = 1 then do;
dcl hash h(dataset : 'clmfile', multidata : 'Y');
h.definekey('Person_ID');
h.definedata(all : 'Y');
h.definedone();
end;

set stayfile point=rid;

if 0 then set clmfile ;
call missing(clm_beg_dt, clm_end_dt, &ccsr_dx_proc_var_comma.);
new_DXCCSR_BLD001=0;
new_DXCCSR_BLD002=0;
/*over 1000 similar equations here*/

do while (h.do_over() = 0);
if clm_beg_dt >= Stay_from_dt and clm_end_dt <= Stay_Thru_dt then do;
new_DXCCSR_BLD001=DXCCSR_BLD001;
new_DXCCSR_BLD002=DXCCSR_BLD002;
/*over 1000 similar equations here*/
end;
end;

run;

 

Can anyone guide how to revise the code to deal with the memory problem?

 

Thank you so much!

L.

1 ACCEPTED SOLUTION

Accepted Solutions
AhmedAl_Attar
Ammonite | Level 13

@lichee 

So, Your SAS session has access to a maximum of 8GB as your -memsize value indicates.

Why are you loading the 30 Million records into the Hash along with every variable in the data set?

dcl hash h(dataset : 'clmfile', multidata : 'Y');
h.definekey('Person_ID');
h.definedata(all : 'Y');
h.definedone();

Try to load the smaller data set into the Hash and loop through the records of your large data set (clmfile)

If you want to load the large data set into Hash, then use the technique listed on page 4 from this paper https://www.lexjansen.com/nesug/nesug11/ld/ld01.pdf

 

"Now imagine that a real-world file LOOKUP is so large that memory shortage would prevent the hash table from being loaded with the SAT variables alongside KEY, yet we still want to use the hash object for KEY look-up! The workaround, as noted above, is to leave the SAT variables in their original place on disk and instead, load a file record identifier variable RID into the data portion of the hash table H: "

 

View solution in original post

4 REPLIES 4
AhmedAl_Attar
Ammonite | Level 13

Hi @lichee 

Couple of changes that could help you with the memory issue

  1. Use an explicit -memsize xG (x: number) SAS invocation option to specify how much memory the SAS process has access to. On Linux/Windows the default is 2G. You can check your SAS session's setting by running the following 
    Proc options option=memsize; run;
  2. Explicitly specify the HashExp value in your Hash object declaration. Default: 8, Max: 16 

Hope that helps,

Ahmed

data_null__
Jade | Level 19

What is MEMSIZE currently?

 

Did you try increasing the -MEMSIZE 8G 

lichee
Quartz | Level 8
Thank you both!

I just ran Proc options option=memsize; run; as Ahmed suggested.

MEMSIZE=8589934592
AhmedAl_Attar
Ammonite | Level 13

@lichee 

So, Your SAS session has access to a maximum of 8GB as your -memsize value indicates.

Why are you loading the 30 Million records into the Hash along with every variable in the data set?

dcl hash h(dataset : 'clmfile', multidata : 'Y');
h.definekey('Person_ID');
h.definedata(all : 'Y');
h.definedone();

Try to load the smaller data set into the Hash and loop through the records of your large data set (clmfile)

If you want to load the large data set into Hash, then use the technique listed on page 4 from this paper https://www.lexjansen.com/nesug/nesug11/ld/ld01.pdf

 

"Now imagine that a real-world file LOOKUP is so large that memory shortage would prevent the hash table from being loaded with the SAT variables alongside KEY, yet we still want to use the hash object for KEY look-up! The workaround, as noted above, is to leave the SAT variables in their original place on disk and instead, load a file record identifier variable RID into the data portion of the hash table H: "

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 551 views
  • 1 like
  • 3 in conversation