BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PegaZeus
Obsidian | Level 7

Greetings, 

 

So I have two tables, one of them with around 100,000,000 healthcare claims (claims), and one with a list of NDCs (ndc_table). I'm trying to use a key-indexing approach to subset to include only those claims associated with NDCs in the NDC table.

 

data final; 
  array ndc_list (*) _temporary_; 
  do until (eof1); 
    set ndc_table end=eof1; 
        ndc_list(ndc) = ndc; 
end;
do until (eof2); 
  set claims end=eof2; 
    array_search = ndc_list(ndc); 
    if array_search > . then output; 
end;
run;

I'm getting 3 errors that I can't seem to figure out:

ERROR: The non-variable based array ndc_list has been defined with zero elements.

ERROR: Too many array subscripts specified for array ndc_list.

ERROR: Too many array subscripts specified for array ndc_list.

 

I used (*) because the number of NDCs in the ndc_table is going to change over time. Right now there are around 3500 values.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

@PegaZeus wrote:

Okay I found out that what we actually want is not to subset the claims but simply to left join on the ndc key and bring in the other fields from the ndc table.

 

So I wrote a hash left lookup like this, however my output gives me blank values in the entire dataset. Nothing is populated.

 

data TEST(drop=rc:);
  if 0 then set ndc_table;
  if _N_ = 1 then do;
      declare hash h1(dataset:"ndc_table");
      h1.defineKey('NDC');
      h1.defineData(all:'Y');
      h1.defineDone();
  end;
  set claims;
  rc1=h1.find();
  if rc1 ne 0 then call missing(of _all_);
run;

If it's a "left join on the ndc key", then you need an "if test" to avoid outputing claims that don't match the ndc table.  So instead of 

rc1=h1.find();
if rc1 ne 0 then call missing(of _all_);

you're probably better off with

if h1.find()=0  then output;

If the NDC table is a miniscule fraction of the claims dataset, may be there are a few cases that aren't all blank … but since your program produces a dataset with the same number of records as claims, maybe you didn't see them.

 

And of course you can eliminate the "drop=rc1" option in the data statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

15 REPLIES 15
FreelanceReinh
Jade | Level 19

Hello @PegaZeus,

 

Arrays in SAS need to be defined with a dimension. In your case you may want to choose a "large enough" value, e.g., 999999 if this is larger than the maximum NDC value you would expect. Or determine the maximum in a preliminary step and use it in the array definition. SAS hash objects do not have this requirement. They grow as needed while they are loaded with data. So this would be an alternative approach.

PeterClemmensen
Tourmaline | Level 20

@PegaZeus what is the maximum value of ndc?

PegaZeus
Obsidian | Level 7

The max value of NDC would be 99999999999. This is not the number of ndcs in the ndc table though. In the ndc table there are about 3500 NDCs.

PeterClemmensen
Tourmaline | Level 20

So NDC can range from 1 to 99999999999? Or does 99999999999 have some special meaning?

PegaZeus
Obsidian | Level 7

That's correct, except instead of just 1, it would be 00000000001. It has to have 11 digits.

PeterClemmensen
Tourmaline | Level 20

Ok. Definitely go with the Hash approach suggested by @FreelanceReinh then 🙂

FreelanceReinh
Jade | Level 19

@PegaZeus wrote:

The max value of NDC would be 99999999999. This is not the number of ndcs in the ndc table though. In the ndc table there are about 3500 NDCs.


This 11-digit number would likely be too large as an array dimension, so the key-indexing approach would need a modification or perhaps you just resort to the hash object approach shown below:

data final;
dcl hash h(dataset:'ndc_table');
h.definekey('ndc');
h.definedone();
do until(eof);
  set claims end=eof;
  if h.check()=0 then output;
end;
stop;
run;
PeterClemmensen
Tourmaline | Level 20

@FreelanceReinh out of interest, what modification do you have in mind to approach this problem with key-indexing? Bitmapping?

FreelanceReinh
Jade | Level 19

Oh, nothing concrete yet. I only remembered those Dorfman papers about key-indexing and was thinking that the NDC numbers perhaps could be mapped to a much smaller range of integers (without collisions). But this would, of course, require deeper knowledge about the structure of those NDC numbers.

PeterClemmensen
Tourmaline | Level 20

Ok. Thank you. Triggered my curiosity 🙂 The Dorfman papers entered my mind as well.

PegaZeus
Obsidian | Level 7

Okay I found out that what we actually want is not to subset the claims but simply to left join on the ndc key and bring in the other fields from the ndc table.

 

So I wrote a hash left lookup like this, however my output gives me blank values in the entire dataset. Nothing is populated.

 

data TEST(drop=rc:);
  if 0 then set ndc_table;
  if _N_ = 1 then do;
      declare hash h1(dataset:"ndc_table");
      h1.defineKey('NDC');
      h1.defineData(all:'Y');
      h1.defineDone();
  end;
  set claims;
  rc1=h1.find();
  if rc1 ne 0 then call missing(of _all_);
run;
Tom
Super User Tom
Super User
When FIND() fails you a setting ALL of the variables to missing, even those the came from the CLAIMS dataset.
If everything ends up missing then you got no hits. Are you sure NDC variable is in both datasets? Is the content in the same format? Perhaps one is using NDC-10 and the other NDC-11.
PegaZeus
Obsidian | Level 7

Interesting, you're right. What actually happened was I was using too small of a claims sample for testing, which didn't include any of the NDCs in the ndc_table. I updated the call missing to only the fields in that table now.

mkeintz
PROC Star

@PegaZeus wrote:

Okay I found out that what we actually want is not to subset the claims but simply to left join on the ndc key and bring in the other fields from the ndc table.

 

So I wrote a hash left lookup like this, however my output gives me blank values in the entire dataset. Nothing is populated.

 

data TEST(drop=rc:);
  if 0 then set ndc_table;
  if _N_ = 1 then do;
      declare hash h1(dataset:"ndc_table");
      h1.defineKey('NDC');
      h1.defineData(all:'Y');
      h1.defineDone();
  end;
  set claims;
  rc1=h1.find();
  if rc1 ne 0 then call missing(of _all_);
run;

If it's a "left join on the ndc key", then you need an "if test" to avoid outputing claims that don't match the ndc table.  So instead of 

rc1=h1.find();
if rc1 ne 0 then call missing(of _all_);

you're probably better off with

if h1.find()=0  then output;

If the NDC table is a miniscule fraction of the claims dataset, may be there are a few cases that aren't all blank … but since your program produces a dataset with the same number of records as claims, maybe you didn't see them.

 

And of course you can eliminate the "drop=rc1" option in the data statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 1615 views
  • 10 likes
  • 5 in conversation