- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I used the hash object as guided in my previous query to create over 1000 clinical classifications software diagnosis and procedure groups for the inpatient stay data. There are over 5 million observations (stays) in the stay file (stayfile) and over 30 million observations (claims) in the claim file (clmfile).
I tried "set .... POINT= " statement as guided here, but still ran into insufficient memory. Below is my code.
data stayfile_id;
set stayfile;
stay_id=cat(Person_ID,"_",put(stay_from_dt,date9.));
keep stay_id;
run;
data rid /view=rid;
set stayfile_id (keep=stay_id );
rid=_n_;
run;
data want(drop = clm_beg_dt clm_end_dt &ccsr_dx_proc_var.);
if _N_ = 1 then do;
dcl hash h(dataset : 'clmfile', multidata : 'Y');
h.definekey('Person_ID');
h.definedata(all : 'Y');
h.definedone();
end;
set stayfile point=rid;
if 0 then set clmfile ;
call missing(clm_beg_dt, clm_end_dt, &ccsr_dx_proc_var_comma.);
new_DXCCSR_BLD001=0;
new_DXCCSR_BLD002=0;
/*over 1000 similar equations here*/
do while (h.do_over() = 0);
if clm_beg_dt >= Stay_from_dt and clm_end_dt <= Stay_Thru_dt then do;
new_DXCCSR_BLD001=DXCCSR_BLD001;
new_DXCCSR_BLD002=DXCCSR_BLD002;
/*over 1000 similar equations here*/
end;
end;
run;
Can anyone guide how to revise the code to deal with the memory problem?
Thank you so much!
L.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So, Your SAS session has access to a maximum of 8GB as your -memsize value indicates.
Why are you loading the 30 Million records into the Hash along with every variable in the data set?
dcl hash h(dataset : 'clmfile', multidata : 'Y');
h.definekey('Person_ID');
h.definedata(all : 'Y');
h.definedone();
Try to load the smaller data set into the Hash and loop through the records of your large data set (clmfile)
If you want to load the large data set into Hash, then use the technique listed on page 4 from this paper https://www.lexjansen.com/nesug/nesug11/ld/ld01.pdf
"Now imagine that a real-world file LOOKUP is so large that memory shortage would prevent the hash table from being loaded with the SAT variables alongside KEY, yet we still want to use the hash object for KEY look-up! The workaround, as noted above, is to leave the SAT variables in their original place on disk and instead, load a file record identifier variable RID into the data portion of the hash table H: "
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @lichee
Couple of changes that could help you with the memory issue
- Use an explicit -memsize xG (x: number) SAS invocation option to specify how much memory the SAS process has access to. On Linux/Windows the default is 2G. You can check your SAS session's setting by running the following
Proc options option=memsize; run;
- Explicitly specify the HashExp value in your Hash object declaration. Default: 8, Max: 16
Hope that helps,
Ahmed
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What is MEMSIZE currently?
Did you try increasing the -MEMSIZE 8G
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I just ran Proc options option=memsize; run; as Ahmed suggested.
MEMSIZE=8589934592
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So, Your SAS session has access to a maximum of 8GB as your -memsize value indicates.
Why are you loading the 30 Million records into the Hash along with every variable in the data set?
dcl hash h(dataset : 'clmfile', multidata : 'Y');
h.definekey('Person_ID');
h.definedata(all : 'Y');
h.definedone();
Try to load the smaller data set into the Hash and loop through the records of your large data set (clmfile)
If you want to load the large data set into Hash, then use the technique listed on page 4 from this paper https://www.lexjansen.com/nesug/nesug11/ld/ld01.pdf
"Now imagine that a real-world file LOOKUP is so large that memory shortage would prevent the hash table from being loaded with the SAT variables alongside KEY, yet we still want to use the hash object for KEY look-up! The workaround, as noted above, is to leave the SAT variables in their original place on disk and instead, load a file record identifier variable RID into the data portion of the hash table H: "