Solved: Using Hash object to compare fields and create new variables runs insu...

lichee · Posted 02-28-2024 11:00 AM

Hi all,

I used the hash object as guided in my previous query to create over 1000 clinical classifications software diagnosis and procedure groups for the inpatient stay data. There are over 5 million observations (stays) in the stay file (stayfile) and over 30 million observations (claims) in the claim file (clmfile).

I tried "set .... POINT= " statement as guided here, but still ran into insufficient memory. Below is my code.

data stayfile_id;
set stayfile;
stay_id=cat(Person_ID,"_",put(stay_from_dt,date9.));
keep stay_id;
run;
data rid /view=rid;
set stayfile_id (keep=stay_id );
rid=_n_;
run;

data want(drop = clm_beg_dt clm_end_dt &ccsr_dx_proc_var.);

if _N_ = 1 then do;
dcl hash h(dataset : 'clmfile', multidata : 'Y');
h.definekey('Person_ID');
h.definedata(all : 'Y');
h.definedone();
end;

set stayfile point=rid;

if 0 then set clmfile ;
call missing(clm_beg_dt, clm_end_dt, &ccsr_dx_proc_var_comma.);
new_DXCCSR_BLD001=0;
new_DXCCSR_BLD002=0;
/*over 1000 similar equations here*/

do while (h.do_over() = 0);
if clm_beg_dt >= Stay_from_dt and clm_end_dt <= Stay_Thru_dt then do;
new_DXCCSR_BLD001=DXCCSR_BLD001;
new_DXCCSR_BLD002=DXCCSR_BLD002;
/*over 1000 similar equations here*/
end;
end;

run;

Can anyone guide how to revise the code to deal with the memory problem?

Thank you so much!

L.

AhmedAl_Attar · Posted 02-28-2024 11:53 AM

@lichee

So, Your SAS session has access to a maximum of 8GB as your -memsize value indicates.

Why are you loading the 30 Million records into the Hash along with every variable in the data set?

dcl hash h(dataset : 'clmfile', multidata : 'Y');
h.definekey('Person_ID');
h.definedata(all : 'Y');
h.definedone();

Try to load the smaller data set into the Hash and loop through the records of your large data set (clmfile)

If you want to load the large data set into Hash, then use the technique listed on page 4 from this paper https://www.lexjansen.com/nesug/nesug11/ld/ld01.pdf

"Now imagine that a real-world file LOOKUP is so large that memory shortage would prevent the hash table from being loaded with the SAT variables alongside KEY, yet we still want to use the hash object for KEY look-up! The workaround, as noted above, is to leave the SAT variables in their original place on disk and instead, load a file record identifier variable RID into the data portion of the hash table H: "

View solution in original post

AhmedAl_Attar · Posted 02-28-2024 11:24 AM

Hi @lichee

Couple of changes that could help you with the memory issue

Use an explicit -memsize xG (x: number) SAS invocation option to specify how much memory the SAS process has access to. On Linux/Windows the default is 2G. You can check your SAS session's setting by running the following
```
Proc options option=memsize; run;
```
Explicitly specify the HashExp value in your Hash object declaration. Default: 8, Max: 16

Hope that helps,

Ahmed

data_null__ · Posted 02-28-2024 11:29 AM

What is MEMSIZE currently?

Did you try increasing the -MEMSIZE 8G

lichee · Posted 02-28-2024 11:37 AM

Thank you both!

I just ran Proc options option=memsize; run; as Ahmed suggested.

MEMSIZE=8589934592

AhmedAl_Attar · Posted 02-28-2024 11:53 AM

@lichee

So, Your SAS session has access to a maximum of 8GB as your -memsize value indicates.

Why are you loading the 30 Million records into the Hash along with every variable in the data set?

dcl hash h(dataset : 'clmfile', multidata : 'Y');
h.definekey('Person_ID');
h.definedata(all : 'Y');
h.definedone();

Try to load the smaller data set into the Hash and loop through the records of your large data set (clmfile)

If you want to load the large data set into Hash, then use the technique listed on page 4 from this paper https://www.lexjansen.com/nesug/nesug11/ld/ld01.pdf

"Now imagine that a real-world file LOOKUP is so large that memory shortage would prevent the hash table from being loaded with the SAT variables alongside KEY, yet we still want to use the hash object for KEY look-up! The workaround, as noted above, is to leave the SAT variables in their original place on disk and instead, load a file record identifier variable RID into the data portion of the hash table H: "

Using Hash object to compare fields and create new variables runs insufficient memory

Re: Using Hash object to compare fields and create new variables runs insufficient memory

Re: Using Hash object to compare fields and create new variables runs insufficient memory

Re: Using Hash object to compare fields and create new variables runs insufficient memory

Re: Using Hash object to compare fields and create new variables runs insufficient memory

Re: Using Hash object to compare fields and create new variables runs insufficient memory

Using Hash object to compare fields and create new variables runs insufficient memory

Re: Using Hash object to compare fields and create new variables runs insufficient memory

Re: Using Hash object to compare fields and create new variables runs insufficient memory

Re: Using Hash object to compare fields and create new variables runs insufficient memory

Re: Using Hash object to compare fields and create new variables runs insufficient memory

Re: Using Hash object to compare fields and create new variables runs insufficient memory

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away