DATA Step, Macro, Functions and more

Data Selection

Reply
Occasional Contributor thb
Occasional Contributor
Posts: 7

Data Selection

Hello,

 

I have a dataset where I need to select a record if a record has data within all three metrics.  For example, 

 

Record_IDMetricNumeratorDenominator
1Insulin11000
1Anti-Coagulants31000
2Insulin03000
3Insulin12000
2Anti-Coagulants21000
4Anti-Coagulants14000
1Opioids11000
2Opioids12000
3Insulin21000

 

I only want to pull data that has a numerator for all three metrics (Insulin, Anti-Coagulants, and Opioids).  Although Record_ID 2 has a numerator in all three metrics, I do not want to pull it because it has a '0' for the Insulin metric.  Does anyone have a recommendation to selecting data this way?  The end table that I want would be:

 

Record_IDMetricNumeratorDenominator
1Insulin11000
1Anti-Coagulants31000
1Opioids11000

 

Any assistance would be greatly appreciated.

 

Thank you

 

 

Super User
Posts: 11,343

Re: Data Selection

Will any of your Record_Id ever have more than 3 records? Fewer than 3 records? If so what is the selection rule(s)?

 

If they have exactly 3 records will they always be Insulin, Anti-coagulants and Opiods?

Occasional Contributor thb
Occasional Contributor
Posts: 7

Re: Data Selection

Yes, they'll have multiple records by month from 2014 to present.  The criteria for selection is that they need to have records for these three metrics to be counted.  

 

 

Trusted Advisor
Posts: 1,019

Re: Data Selection

[ Edited ]

 

If you never have more than one record for a given id/metric then sort by id and metric, then count metrics (with numberator>0) within each id:

 

 

Editted change:  (Looking at Art's solution revealed a flaw in my suggest - here's the revised response);

proc sort data=have (where=(numerator>0)) out=need;

  by id metric;

run;

 

data want;

  do n_metric=1 by 1 until (last.id);

    set need have (where=(numerator>0));

    by id;

  end;

  do until (last.id);

    set need have;

    by id;

    if n_metric=3 then output;

  end;

run;

 

PROC Star
Posts: 7,468

Re: Data Selection

In that case I'd use a DOW loop to test whether each metric met the condition, e.g.:

proc sort data=have out=want;
  by Record_ID Metric;
run;

data want (drop=m1-m3);
  do until (last.Record_ID);
    set want;
    by Record_ID;
    if Metric eq 'Insulin' and Numerator gt 0 then m1=1;
    else if Metric eq 'Anti-Coagulants' and Numerator gt 0 then m2=1;
    else if Metric eq 'Opioids' and Numerator gt 0 then m3=1;
  end;
  do until (last.Record_ID);
    set want;
    by Record_ID;
    if sum(of m1-m3) eq 3 then output;
  end;
run;

Art, CEO, AnalystFinder.com

 

Super User
Posts: 10,023

Re: Data Selection

data have;
infile cards expandtabs truncover;
input Record_ID	Metric : $40.	Numerator	Denominator;
cards;
1	Insulin	1	1000
1	Anti-Coagulants	3	1000
2	Insulin	0	3000
3	Insulin	1	2000
2	Anti-Coagulants	2	1000
4	Anti-Coagulants	1	4000
1	Opioids	1	1000
2	Opioids	1	2000
3	Insulin	2	1000
;
run;
proc sql;
select *
 from have
   group by Record_ID
    having count(distinct Metric)=3 and sum(Numerator=0)=0;
quit;
Ask a Question
Discussion stats
  • 5 replies
  • 132 views
  • 0 likes
  • 5 in conversation