I have a dataset that I transposed. My goal is to identify whether each individual had any presence of several diseases in any visit. This is how the data is set up:
ID | Comorb | Visit1 | Visit2 | Vsit3 | Visit4 |
1 | Diabetes | 0 | 0 | 0 | 0 |
1 | Stroke | 0 | 1 | 1 | 1 |
1 | Osteoar | 0 | 1 | 0 | 0 |
2 | Diabetes | 1 | 1 | 1 | 1 |
2 | Stroke | 0 | 0 | 0 | 0 |
2 | Osteoar | 0 | 0 | 1 | 1 |
I have more comorbidities and more visits in my dataset, this is abbreviated.
My idea was to create a list of the comorbidities so that I can loop through each one and find the max of all the visits. If there is a 1 in any of the columns, then that person gets a 1 for that disease. I'm trying to make a macro but it's not working, this is what I have so far:
%let visits= visit0, visit1, visit2, visit3, visit4;
%let name_list=Diabetes Stroke Osteoar;
%macro get_comorbidities(comorb_name=);
%local i next_name;
%do i=1 %to %sysfunc(countw(&name_list));
%let next_name = %scan(&name_list, &i);
data try1;
set comorb_t;
%if comorb=&next_name %then &comorb_name=max(&visits);
run;
%end;
%mend get_comorbidities;
%get_comorbidities(comorb_name=diabetes_flag);
But the new variable (diabetes in this case) isn't getting created. My macro understanding is still basic so I appreciate any help!
Then I wouldn't bother with a macro - much too messy. Use arrays:
data comorb_max (drop=i);
set comorb_t;
array morb_vals {3} $10 _temporary_ ("diabetes","osteoar","stroke");
array morb_vars {3} diabetes osteoar stroke ;
do i=1 to dim(morb_vars);
if comorb=morb_vals{i} then morb_vars{i}=max(of visit:);
end;
run;
I suggest you write one iteration of your code WITHOUT using macro coding. Make sure it works, then post it here, and we can help you learn how to macro-ize it (and whether to macro-ize it)..
For each individual, I would like them to have a variable for each disease. For example:
ID | Comorb | Visit1 | Visit2 | Vsit3 | Visit4 | Diabetes | Stroke | Osteoar |
1 | Diabetes | 0 | 0 | 0 | 0 | 0 | . | . |
1 | Stroke | 0 | 1 | 1 | 1 | . | 1 | . |
1 | Osteoar | 0 | 1 | 0 | 0 | . | . | 1 |
2 | Diabetes | 1 | 1 | 1 | 1 | 1 | . | . |
2 | Stroke | 0 | 0 | 0 | 0 | . | 0 | . |
2 | Osteoar | 0 | 0 | 1 | 1 | . | . | 1 |
and this is my "unmacro-ized code";
data comorb_max;
set comorb_t;
if comorb="diabetes" then diabetes=max(&visits);
if comorb="osteoar" then osteoar=max(&visits);
if comorb="stroke" then stroke=max(&visits);
run;
The reason I thought to use a macro to loop through the comorbidity names is because I actually have ~15 of them, so the code looks messy.
I would eventually like to get each row to be one individual with the 3 variables of diseases, so like this:
ID | Diabetes | Stroke | Osteoar |
1 | 0 | 1 | 1 |
2 | 1 | 0 | 1 |
So now I'm wondering if my approach is not efficient at all - even without the macro.
Then I wouldn't bother with a macro - much too messy. Use arrays:
data comorb_max (drop=i);
set comorb_t;
array morb_vals {3} $10 _temporary_ ("diabetes","osteoar","stroke");
array morb_vars {3} diabetes osteoar stroke ;
do i=1 to dim(morb_vars);
if comorb=morb_vals{i} then morb_vars{i}=max(of visit:);
end;
run;
I'm not sure why I forgot about arrays - maybe I was just trying to be fancy with macros! Anyway, thank you!
No macro required. You could do:
data have;
input ID Comorb :$16. Visit1 Visit2 Visit3 Visit4;
datalines;
1 Diabetes 0 0 0 0
1 Stroke 0 1 1 1
1 Osteoar 0 1 0 0
2 Diabetes 1 1 1 1
2 Stroke 0 0 0 0
2 Osteoar 0 0 1 1
;
proc transpose data=have out=temp;
by id;
id comorb;
var visit:;
run;
proc summary data=temp;
by id;
var diabetes -- osteoar;
output out=want(drop=_:) max=;
run;
proc print data=want noobs; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.