Hello,
I am using the full version of SAS 9.4 via a virtual desktop. I had two questions about coding an array. My data is in character format.
1. Each row is a different case and there are 3 columns of comorbidities. Patients can have 0-3 comorbidities but one person cannot have a repeated diagnosis. I am interested to know the overall most frequent diagnosis regardless of whether it is in position 1, 2 or 3. Is there a special name for an array like this?
2. Second, I would like to know the average number of comorbidities per person from the above data. How do I code this such that SAS counts and assigns a number of co-morbidities to each patient? Then I can do proc means to get the mean number.
Thank you in advance for the help!
Best
Laura
Yes, for 34 an array would be appropriate. For example:
data want;
set have;
array comorbs {34} code1-code34;
n_comorbs=0;
do _n_=1 to 34;
if comorbs{_n_} > ' ' then n_comorbs + 1;
end;
run;
It's a bit difficult to give an answer without you providing sample data. From what you describe I'd assume that you best transpose your data to a long structure by creating a row per diagnosis. After this it's a simple count, freq, rank etc.
Here is some sample data:
Comorbidity1 Comorbidity2 Comorbidity 3
Patient 1 Diabetes Heart failure Chronic renal failure
Patient 2 Chronic renal failure Diabetes .
Patient 3 Diabetes . .
For the first question, what I want is a table with descending frequencies of comorbidities:
Diabetes 3 (most frequent)
Chronic renal failure 2
Heart failure 1
For the second question, what I want is a count number for each patient:
Patient 1 3 comorbidities
Patient 2 2 comorbidities
Patient 3 1 comorbidity
The second question is actually relatively easy. If your variables are character:
data want;
set have;
n_comorbs = (comorbid1 > ' ') + (cormorbid2 > ' ') + (cormorbid3 > ' ');
run;
For the first question, it's probably faster to process each variable separately, then combine the results. For example:
proc freq data=have noprint;
tables comorbid1 / out=tot1 (rename=(comorbid1 = diagnosis));
tables comorbid2 / out=tot2 (rename=(comorbid1 = diagnosis));
tables comorbid3 / out=tot3 (rename=(comorbid1 = diagnosis));
run;
data want;
set comorbid1 comorbid2 comorbid3;
where diagnosis > ' ';
run;
proc means data=want sum;
var count;
class diagnosis;
run;
It's possible to do this with 2 steps instead of 3, but the processing would take longer because both steps would work with a larger data set. If your data set is small to begin with, and you want to see alternatives, they definitely exist.
Thanks so much - if I used the following code you sent and there are up to 34 comorbidities, is there a quick way to code 1-34 without listing them all? Would I then have to use an array?
Thanks
Laura
YOUR ORIGINAL CODE:
data want;
set have;
n_comorbs = (comorbid1 > ' ') + (cormorbid2 > ' ') + (cormorbid3 > ' ');
run;
Yes, for 34 an array would be appropriate. For example:
data want;
set have;
array comorbs {34} code1-code34;
n_comorbs=0;
do _n_=1 to 34;
if comorbs{_n_} > ' ' then n_comorbs + 1;
end;
run;
Thank you - that coding worked!
Now if I want to find the most frequent comorbidity among the 34 columns, I should probably do an array as well but have to retain the characters instead of counting between apostrophes ' '. I think you had suggested doing each variable individually via proc freq but that was when there were only 3 columns.
Yes, with 34 potential fields to examine, that would look different. The WANT data set here has to be separate from the WANT data set for the other question, because this one contains multiple observations for each original observation:
data diagnoses;
set have;
array comorbs {34} code1-code34;
do _n_=1 to 34;
if comorbs{_n_} > ' ' then do;
diagnosis = comorbs{_n_};
output;
end;
end;
keep diagnosis;
run;
proc freq data=diagnoses;
tables diagnosis;
run;
Consider adding the ORDER=FREQ option to print the table by descending frequency count instead of alphabetically.
Thank you, this works perfectly!
In response to this "keep" statement, SAS literally only keeps the new variable, so I have to reload and remerge the datasets after I do this proc. Is there a way to tell SAS to keep all of the old variables in addition to the new variable?
Thanks again - I really appreciate the help!
Best
Laura
@lmyers2 wrote:
Thank you, this works perfectly!
In response to this "keep" statement, SAS literally only keeps the new variable, so I have to reload and remerge the datasets after I do this proc. Is there a way to tell SAS to keep all of the old variables in addition to the new variable?
Thanks again - I really appreciate the help!
Best
Laura
The default is to keep every variable. If you don't want to keep everything then you can list those to KEEP or those to DROP.
Just remove the KEEP statement.
As @Tom said, you can get rid of the KEEP statement. That keeps all variables. However, note that you are now outputting a separate observation for each diagnosis. With multiple observations, you may not want to keep all those other variables ... but you do have the choice.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.