Hello, I am trying to filter a health data to identify my study population using specific Drug Identification Numbers (DINs) for asthma inhaler medications. I have a list of DINs I am interested in. I will like SAS to identify individuals and observations that matches one of the DINs. Here are some of the DINs I am interested in: 812463; 824216, 851752;851760;851841 etc. I really need help on this. Any comments would be greatly appreciated.
Thank you
Define "large".
The (probably) most efficient method would be a hash object in a data step, as long as you can fit the reference numbers into memory (likely, because you'll basically need 8 bytes * number of items, not much in the times of gigabyte RAM).
data want;
set have;
if _n_ = 1
then do;
declare hash r (dataset:"reference");
r.definekey("din");
r.definedone();
end;
if r.check() = 0;
run;
The only sorting is done in memory on the lookup table.
Simple with SQL, if dataset size is not that large:
proc sql;
create table want as
select *
from have
where din in (select din from reference)
;
quit;
I think this syntax is better not shown; it's disastrous when the volumes grow.
An inner join is always valid:
proc sql;
create table WANT as
select HAVE.*
from HAVE, DINLIST
where HAVE.DIN = DINLIST.DIN;
quit;
Or if there are duplicates:
proc sql;
create table WANT as
select HAVE.*
from HAVE, (select unique DIN from DINLIST) DINLIST
where HAVE.DIN = DINLIST.DIN;
quit;
Thank you. Will try this option too. Just that i have a tall list of DIN lists (about 450 DINS).
The data size is very large but i will try this whether it will work. Thank you.
Define "large".
The (probably) most efficient method would be a hash object in a data step, as long as you can fit the reference numbers into memory (likely, because you'll basically need 8 bytes * number of items, not much in the times of gigabyte RAM).
data want;
set have;
if _n_ = 1
then do;
declare hash r (dataset:"reference");
r.definekey("din");
r.definedone();
end;
if r.check() = 0;
run;
The only sorting is done in memory on the lookup table.
@Kurt_Bremser Each item uses 48 bytes in a one-numeric-key hash table.
So we have 40 bytes overhead per item? That would be an interesting fact for future calculations of RAM requirements.
So the minimum overhead for the internal search table seems to be 32 bytes per item. The macro will come in handy, thanks for posting.
Another option is to create a custom format that has a formatted value of "Want", or similar, for the values of interest and "Not wanted" for other values.
Then you can either select records with a where statement or use the formatted value directly, depending on need.
1. Use a WHERE statement, applied in either SQL or Data step
where DIN in (812463, 824216.....);
2. Create a lookup data set
data dins_asthma;
input DIN;
cards;
812463
824216
851752
;;;;
run;
Then any of the other suggestions will work for you.
@Asboaheng wrote:
Hello, I am trying to filter a health data to identify my study population using specific Drug Identification Numbers (DINs) for asthma inhaler medications. I have a list of DINs I am interested in. I will like SAS to identify individuals and observations that matches one of the DINs. Here are some of the DINs I am interested in: 812463; 824216, 851752;851760;851841 etc. I really need help on this. Any comments would be greatly appreciated.
Thank you
Maybe something solving your problem has already been suggested, if not you should post an excerpt of the data you have or the result of proc contents, so that we actually know how what "health data" is in your case.
For completenes.. If the range of DINs is somewhat sparse, this should be quite fast as well...
data din;
input din;
datalines;
812463
824216
851752
851760
851841
;
data have;
input din v;
datalines;
812463 1
123456 2
824216 3
234567 4
851752 5
345678 6
851760 7
456789 8
851841 9
;
data want;
array d {0 : 999999} _temporary_;
do until (z1);
set din end = z1;
d[din] = 1;
end;
do until (z2);
set have end = z2;
if d[din] then output;
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.