I have some patients' medication data, need to convert it to binary table. Anyone has the solution?
what I have:
Patient | Medication1 | Medication2 | Medication3 |
ID1 | A | B | C |
ID2 | C | E | F |
ID3 | E | G | H |
what I want:
Patient | A | B | C | D | E | F | G | H |
ID1 | 1 | 1 | 1 | |||||
ID2 | 1 | 1 | 1 | |||||
ID3 | 1 | 1 | 1 |
I am skeptical about this re-arrangement of the data. In the real world, medications are going to have names like Hydroxyzine, and then do you really want SAS variables to be named Hydroxyzine and AcetylSalicilic and so on? In addition, I can't see how this proposed re-arrangement of the data is useful for additional analyses.
A much better re-arrangement would be
data have;
input Patient $ Medication1 $ Medication2 $ Medication3 $;
cards;
ID1 A B C
ID2 C E F
ID3 E G H
;
data want;
set have;
medication=medication1;
output;
medication=medication2;
output;
medication=medication3;
output;
drop medication1-medication3;
run;
First a comment on "binary". That means two values, typically 1 and 0. You only show one value, the 1, which would technically be unary.
For your given example this creates binary values, 1 where present and 0 where not. Caveats to follow:
data have; input Patient $ Medication1 $ Medication2 $ Medication3 $; datalines; ID1 A B C ID2 C E F ID3 E G H ; data want; set have; array m (*) medication: ; array n (*) A B C D E F G H; array v (8) $ 1 _temporary_('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'); do i=1 to dim(v); n[i] = whichc(v[i],of m(*))>0; end; drop Medication: i; run;
The WHICHC function searches for matches of the first parameter in the remaining list of values and returns the position number if found or 0 if not found. This match is case sensitive. So with this ALL of your values may need to have the case standardized.
I have assumed that your actual values are likely to be different than actually the letters A B etc. due to the name for the variables with Medication. I suspect you actually have medication names. If so you need to change the names of the variables in the N array to something acceptable as SAS variable names, no hyphens, spaces or slash characters. The values in the V array would be the standardized spelling of the actual words that appear and the number following the $ needs to be large enough to hold the longest actual text that appears.
Arrays are one way to handle multiple variables in a similar way. Also a group of values may often be addressed with the "of arrayname(*)" syntax in functions that allow lists of values.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.