Hi all,
I used the hash object as guided in my previous query to create over 1000 clinical classifications software diagnosis and procedure groups for the inpatient stay data. There are over 5 million observations (stays) in the stay file (stayfile) and over 30 million observations (claims) in the claim file (clmfile).
I tried "set .... POINT= " statement as guided here, but still ran into insufficient memory. Below is my code.
data stayfile_id;
set stayfile;
stay_id=cat(Person_ID,"_",put(stay_from_dt,date9.));
keep stay_id;
run;
data rid /view=rid;
set stayfile_id (keep=stay_id );
rid=_n_;
run;
data want(drop = clm_beg_dt clm_end_dt &ccsr_dx_proc_var.);
if _N_ = 1 then do;
dcl hash h(dataset : 'clmfile', multidata : 'Y');
h.definekey('Person_ID');
h.definedata(all : 'Y');
h.definedone();
end;
set stayfile point=rid;
if 0 then set clmfile ;
call missing(clm_beg_dt, clm_end_dt, &ccsr_dx_proc_var_comma.);
new_DXCCSR_BLD001=0;
new_DXCCSR_BLD002=0;
/*over 1000 similar equations here*/
do while (h.do_over() = 0);
if clm_beg_dt >= Stay_from_dt and clm_end_dt <= Stay_Thru_dt then do;
new_DXCCSR_BLD001=DXCCSR_BLD001;
new_DXCCSR_BLD002=DXCCSR_BLD002;
/*over 1000 similar equations here*/
end;
end;
run;
Can anyone guide how to revise the code to deal with the memory problem?
Thank you so much!
L.
So, Your SAS session has access to a maximum of 8GB as your -memsize value indicates.
Why are you loading the 30 Million records into the Hash along with every variable in the data set?
dcl hash h(dataset : 'clmfile', multidata : 'Y');
h.definekey('Person_ID');
h.definedata(all : 'Y');
h.definedone();
Try to load the smaller data set into the Hash and loop through the records of your large data set (clmfile)
If you want to load the large data set into Hash, then use the technique listed on page 4 from this paper https://www.lexjansen.com/nesug/nesug11/ld/ld01.pdf
"Now imagine that a real-world file LOOKUP is so large that memory shortage would prevent the hash table from being loaded with the SAT variables alongside KEY, yet we still want to use the hash object for KEY look-up! The workaround, as noted above, is to leave the SAT variables in their original place on disk and instead, load a file record identifier variable RID into the data portion of the hash table H: "
Hi @lichee
Couple of changes that could help you with the memory issue
Proc options option=memsize; run;
Hope that helps,
Ahmed
What is MEMSIZE currently?
Did you try increasing the -MEMSIZE 8G
So, Your SAS session has access to a maximum of 8GB as your -memsize value indicates.
Why are you loading the 30 Million records into the Hash along with every variable in the data set?
dcl hash h(dataset : 'clmfile', multidata : 'Y');
h.definekey('Person_ID');
h.definedata(all : 'Y');
h.definedone();
Try to load the smaller data set into the Hash and loop through the records of your large data set (clmfile)
If you want to load the large data set into Hash, then use the technique listed on page 4 from this paper https://www.lexjansen.com/nesug/nesug11/ld/ld01.pdf
"Now imagine that a real-world file LOOKUP is so large that memory shortage would prevent the hash table from being loaded with the SAT variables alongside KEY, yet we still want to use the hash object for KEY look-up! The workaround, as noted above, is to leave the SAT variables in their original place on disk and instead, load a file record identifier variable RID into the data portion of the hash table H: "
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.