Solved: Re: Identify value combinations in dataset based on reference file - Page 2

Tom · Posted 05-09-2023 12:06 PM

The code is much simpler with "vertical" or "tall" datasets.

So you have patient data with ID and ICDCODE.

And you have category data with CATGORY ICDCODE and a third variable to indicate if the ICDCODE is for "COMB1" or "COMB2" let's call this STEP.

Then you just need to join and aggregate.

proc sql;
create table want as
select a.id
        , b.category
        , max( b.step = 'COMB1' ) as COMB1
        , max( b.step = 'COMB2' ) as COMB2
from patients a
left join categories b
on a.icdcode = b.icdcode
group by 1,2
;

So now you get a dataset like:

patient category comb1 comb2
101 DIABETES 1 0
101 STROKE 1 1
102 DIABETES 1 1
102 STROKE 0 0

Performance might be difficult depending on the size. But if you are using an RDMS to store the data and do the join they can usually do a good job with proper indexing (and for parallel systems proper partitioning).

Epi_Stats · Posted 05-09-2023 12:27 PM

Thank you @Tom

Quentin · Posted 05-09-2023 05:04 PM

That's a lovely solution, @Tom . I wasted several dog walks trying to think about potential SQL approaches and hash approaches, but never got to the point of even imagining one that I liked enough to try. I had the idea of making things vertical, because life is usually better when things are vertical. But I think I got stuck on thinking about pairs of ICD codes (I was going to have a control dataset with all pairs of ICD codes for each category), rather than treating them independently via your STEP variable.

Re: Identify value combinations in dataset based on reference file

Re: Identify value combinations in dataset based on reference file

Re: Identify value combinations in dataset based on reference file

SAS Innovate 2026 Registration is Open

SAS Training: Just a Click Away