The code is much simpler with "vertical" or "tall" datasets.
So you have patient data with ID and ICDCODE.
And you have category data with CATGORY ICDCODE and a third variable to indicate if the ICDCODE is for "COMB1" or "COMB2" let's call this STEP.
Then you just need to join and aggregate.
proc sql;
create table want as
select a.id
, b.category
, max( b.step = 'COMB1' ) as COMB1
, max( b.step = 'COMB2' ) as COMB2
from patients a
left join categories b
on a.icdcode = b.icdcode
group by 1,2
;
So now you get a dataset like:
patient category comb1 comb2 101 DIABETES 1 0 101 STROKE 1 1 102 DIABETES 1 1 102 STROKE 0 0
Performance might be difficult depending on the size. But if you are using an RDMS to store the data and do the join they can usually do a good job with proper indexing (and for parallel systems proper partitioning).
Thank you @Tom
That's a lovely solution, @Tom . I wasted several dog walks trying to think about potential SQL approaches and hash approaches, but never got to the point of even imagining one that I liked enough to try. I had the idea of making things vertical, because life is usually better when things are vertical. But I think I got stuck on thinking about pairs of ICD codes (I was going to have a control dataset with all pairs of ICD codes for each category), rather than treating them independently via your STEP variable.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.