I have a dataset that I transposed. My goal is to identify whether each individual had any presence of several diseases in any visit. This is how the data is set up:
ID | Comorb | Visit1 | Visit2 | Vsit3 | Visit4 |
1 | Diabetes | 0 | 0 | 0 | 0 |
1 | Stroke | 0 | 1 | 1 | 1 |
1 | Osteoar | 0 | 1 | 0 | 0 |
2 | Diabetes | 1 | 1 | 1 | 1 |
2 | Stroke | 0 | 0 | 0 | 0 |
2 | Osteoar | 0 | 0 | 1 | 1 |
I have more comorbidities and more visits in my dataset, this is abbreviated.
My idea was to create a list of the comorbidities so that I can loop through each one and find the max of all the visits. If there is a 1 in any of the columns, then that person gets a 1 for that disease. I'm trying to make a macro but it's not working, this is what I have so far:
%let visits= visit0, visit1, visit2, visit3, visit4;
%let name_list=Diabetes Stroke Osteoar;
%macro get_comorbidities(comorb_name=);
%local i next_name;
%do i=1 %to %sysfunc(countw(&name_list));
%let next_name = %scan(&name_list, &i);
data try1;
set comorb_t;
%if comorb=&next_name %then &comorb_name=max(&visits);
run;
%end;
%mend get_comorbidities;
%get_comorbidities(comorb_name=diabetes_flag);
But the new variable (diabetes in this case) isn't getting created. My macro understanding is still basic so I appreciate any help!
Then I wouldn't bother with a macro - much too messy. Use arrays:
data comorb_max (drop=i);
set comorb_t;
array morb_vals {3} $10 _temporary_ ("diabetes","osteoar","stroke");
array morb_vars {3} diabetes osteoar stroke ;
do i=1 to dim(morb_vars);
if comorb=morb_vals{i} then morb_vars{i}=max(of visit:);
end;
run;
I suggest you write one iteration of your code WITHOUT using macro coding. Make sure it works, then post it here, and we can help you learn how to macro-ize it (and whether to macro-ize it)..
For each individual, I would like them to have a variable for each disease. For example:
ID | Comorb | Visit1 | Visit2 | Vsit3 | Visit4 | Diabetes | Stroke | Osteoar |
1 | Diabetes | 0 | 0 | 0 | 0 | 0 | . | . |
1 | Stroke | 0 | 1 | 1 | 1 | . | 1 | . |
1 | Osteoar | 0 | 1 | 0 | 0 | . | . | 1 |
2 | Diabetes | 1 | 1 | 1 | 1 | 1 | . | . |
2 | Stroke | 0 | 0 | 0 | 0 | . | 0 | . |
2 | Osteoar | 0 | 0 | 1 | 1 | . | . | 1 |
and this is my "unmacro-ized code";
data comorb_max;
set comorb_t;
if comorb="diabetes" then diabetes=max(&visits);
if comorb="osteoar" then osteoar=max(&visits);
if comorb="stroke" then stroke=max(&visits);
run;
The reason I thought to use a macro to loop through the comorbidity names is because I actually have ~15 of them, so the code looks messy.
I would eventually like to get each row to be one individual with the 3 variables of diseases, so like this:
ID | Diabetes | Stroke | Osteoar |
1 | 0 | 1 | 1 |
2 | 1 | 0 | 1 |
So now I'm wondering if my approach is not efficient at all - even without the macro.
Then I wouldn't bother with a macro - much too messy. Use arrays:
data comorb_max (drop=i);
set comorb_t;
array morb_vals {3} $10 _temporary_ ("diabetes","osteoar","stroke");
array morb_vars {3} diabetes osteoar stroke ;
do i=1 to dim(morb_vars);
if comorb=morb_vals{i} then morb_vars{i}=max(of visit:);
end;
run;
I'm not sure why I forgot about arrays - maybe I was just trying to be fancy with macros! Anyway, thank you!
No macro required. You could do:
data have;
input ID Comorb :$16. Visit1 Visit2 Visit3 Visit4;
datalines;
1 Diabetes 0 0 0 0
1 Stroke 0 1 1 1
1 Osteoar 0 1 0 0
2 Diabetes 1 1 1 1
2 Stroke 0 0 0 0
2 Osteoar 0 0 1 1
;
proc transpose data=have out=temp;
by id;
id comorb;
var visit:;
run;
proc summary data=temp;
by id;
var diabetes -- osteoar;
output out=want(drop=_:) max=;
run;
proc print data=want noobs; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.