BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Asboaheng
Fluorite | Level 6

Hello, I am trying to filter a health data to identify my study population using specific Drug Identification Numbers (DINs) for asthma inhaler medications. I have a list of DINs I am interested in. I will like SAS to identify individuals and observations that matches one of the DINs. Here are some of the DINs I am interested in: 812463; 824216, 851752;851760;851841 etc. I really need help on this. Any comments would be greatly appreciated.

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Define "large".

The (probably) most efficient method would be a hash object in a data step, as long as you can fit the reference numbers into memory (likely, because you'll basically need 8 bytes * number of items, not much in the times of gigabyte RAM).

data want;
set have;
if _n_ = 1
then do;
  declare hash r (dataset:"reference");
  r.definekey("din");
  r.definedone();
end;
if r.check() = 0;
run;

The only sorting is done in memory on the lookup table.

View solution in original post

14 REPLIES 14
ChrisNZ
Tourmaline | Level 20

I think this syntax is better not shown; it's disastrous when the volumes grow. 

An inner join is always valid:

 

proc sql;
  create table WANT as
  select HAVE.*
  from HAVE, DINLIST
  where HAVE.DIN = DINLIST.DIN;
quit;

 

 

ChrisNZ
Tourmaline | Level 20

Or if there are duplicates:

proc sql;
  create table WANT as
  select HAVE.*
  from HAVE, (select unique DIN from DINLIST) DINLIST
  where HAVE.DIN = DINLIST.DIN;
quit;

 

Asboaheng
Fluorite | Level 6

Thank you. Will try this option too. Just that i have a tall list of DIN lists (about 450 DINS).

Asboaheng
Fluorite | Level 6

The data size is very large but i will try this whether it will work. Thank you.

Kurt_Bremser
Super User

Define "large".

The (probably) most efficient method would be a hash object in a data step, as long as you can fit the reference numbers into memory (likely, because you'll basically need 8 bytes * number of items, not much in the times of gigabyte RAM).

data want;
set have;
if _n_ = 1
then do;
  declare hash r (dataset:"reference");
  r.definekey("din");
  r.definedone();
end;
if r.check() = 0;
run;

The only sorting is done in memory on the lookup table.

ChrisNZ
Tourmaline | Level 20

48 bytes is the minimum size (on 64-bit systems), then 16-byte increments.

See the macro I posted here, it explains the allocation logic, and does the calculations for you.

ballardw
Super User

Another option is to create a custom format that has a formatted value of "Want", or similar, for the values of interest and "Not wanted" for other values.

Then you can either select records with a where statement or use the formatted value directly, depending on need.

 

Reeza
Super User

1. Use a WHERE statement, applied in either SQL or Data step

 

where DIN in (812463, 824216.....);

2. Create a lookup data set

 

data dins_asthma;
input DIN;
cards;
812463
824216
851752
;;;;
run;

Then any of the other suggestions will work for you.

 


@Asboaheng wrote:

Hello, I am trying to filter a health data to identify my study population using specific Drug Identification Numbers (DINs) for asthma inhaler medications. I have a list of DINs I am interested in. I will like SAS to identify individuals and observations that matches one of the DINs. Here are some of the DINs I am interested in: 812463; 824216, 851752;851760;851841 etc. I really need help on this. Any comments would be greatly appreciated.

 

Thank you


 

andreas_lds
Jade | Level 19

Maybe something solving your problem has already been suggested, if not you should post an excerpt of the data you have or the result of proc contents, so that we actually know how what "health data" is in your case.

PeterClemmensen
Tourmaline | Level 20

For completenes.. If the range of DINs is somewhat sparse, this should be quite fast as well...

 

data din;
input din;
datalines;
812463
824216
851752
851760
851841
;

data have;
input din v;
datalines;
812463 1
123456 2
824216 3
234567 4
851752 5
345678 6
851760 7
456789 8
851841 9
;

data want;
   array d {0 : 999999} _temporary_;
   do until (z1);
      set din end = z1;
      d[din] = 1;
   end;
   do until (z2);
      set have end = z2;
      if d[din] then output;
   end;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1291 views
  • 8 likes
  • 7 in conversation