BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lmyers2
Obsidian | Level 7

Hello,

 

I am using the full version of SAS 9.4 via a virtual desktop. I had two questions about coding an array. My data is in character format.

 

1. Each row is a different case and there are 3 columns of comorbidities. Patients can have 0-3 comorbidities but one person cannot have a repeated diagnosis.  I am interested to know the overall most frequent diagnosis regardless of whether it is in position 1, 2 or 3. Is there a special name for an array like this?

 

2. Second, I would like to know the average number of comorbidities per person from the above data. How do I code this such that SAS counts and assigns a number of co-morbidities to each patient?  Then I can do proc means to get the mean number.

 

Thank you in advance for the help!

 

Best

Laura 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Yes, for 34 an array would be appropriate.  For example:

 

data want;

set have;

array comorbs {34} code1-code34;   

n_comorbs=0;

do _n_=1 to 34;

   if comorbs{_n_} > ' ' then n_comorbs + 1;

end;

run;

View solution in original post

10 REPLIES 10
Patrick
Opal | Level 21

@lmyers2

It's a bit difficult to give an answer without you providing sample data. From what you describe I'd assume that you best transpose your data to a long structure by creating a row per diagnosis. After this it's a simple count, freq, rank etc.

lmyers2
Obsidian | Level 7

Here is some sample data: 

 

                        Comorbidity1                         Comorbidity2  Comorbidity 3

Patient 1         Diabetes                                  Heart failure   Chronic renal failure

Patient 2         Chronic renal failure               Diabetes          .

Patient 3         Diabetes                                  .                       .

 

For the first question, what I want is a table with descending frequencies of comorbidities:

            Diabetes                      3 (most frequent)

            Chronic renal failure    2

            Heart failure                1

 

For the second question, what I want is a count number for each patient:

            Patient 1         3 comorbidities

            Patient 2         2 comorbidities

            Patient 3         1 comorbidity

Astounding
PROC Star

The second question is actually relatively easy.  If your variables are character:

 

data want;

set have;

n_comorbs = (comorbid1  > ' ') + (cormorbid2 > ' ') + (cormorbid3 > ' ');

run;

 

For the first question, it's probably faster to process each variable separately, then combine the results.  For example:

 

proc freq data=have noprint;

tables comorbid1 / out=tot1 (rename=(comorbid1 = diagnosis));

tables comorbid2 / out=tot2 (rename=(comorbid1 = diagnosis));

tables comorbid3 / out=tot3 (rename=(comorbid1 = diagnosis));

run;

 

data want;

set comorbid1 comorbid2 comorbid3;

where diagnosis > ' ';

run;

 

proc means data=want sum;

var count;

class diagnosis;

run;

 

It's possible to do this with 2 steps instead of 3, but the processing would take longer because both steps would work with a larger data set.  If your data set is small to begin with, and you want to see alternatives, they definitely exist.

lmyers2
Obsidian | Level 7

Thanks so much - if I used the following code you sent and there are up to 34 comorbidities, is there a quick way to code 1-34 without listing them all?   Would I then have to use an array?

 

Thanks

Laura

 

YOUR ORIGINAL CODE:

data want; set have; n_comorbs = (comorbid1 > ' ') + (cormorbid2 > ' ') + (cormorbid3 > ' '); run;

 

Astounding
PROC Star

Yes, for 34 an array would be appropriate.  For example:

 

data want;

set have;

array comorbs {34} code1-code34;   

n_comorbs=0;

do _n_=1 to 34;

   if comorbs{_n_} > ' ' then n_comorbs + 1;

end;

run;

lmyers2
Obsidian | Level 7

Thank you - that coding worked!

 

Now if I want to find the most frequent comorbidity among the 34 columns, I should probably do an array as well but have to retain the characters instead of counting between apostrophes ' '.   I think you had suggested doing each variable individually via proc freq but that was when there were only 3 columns. 

Astounding
PROC Star

Yes, with 34 potential fields to examine, that would look different.  The WANT data set here has to be separate from the WANT data set for the other question, because this one contains multiple observations for each original observation:

 

data diagnoses;

set have;

array comorbs {34} code1-code34;   

do _n_=1 to 34;

   if comorbs{_n_} > ' ' then do;

      diagnosis = comorbs{_n_};

      output;

   end;

end;

keep diagnosis;

run;

 

proc freq data=diagnoses;

tables diagnosis;

run;

 

Consider adding the ORDER=FREQ option to print the table by descending frequency count instead of alphabetically.

lmyers2
Obsidian | Level 7

Thank you, this works perfectly!

 

In response to this "keep" statement, SAS literally only keeps the new variable, so I have to reload and remerge the datasets after I do this proc. Is there a way to tell SAS to keep all of the old variables in addition to the new variable?

 

Thanks again - I really appreciate the help!

 

Best

Laura

Tom
Super User Tom
Super User

@lmyers2 wrote:

Thank you, this works perfectly!

 

In response to this "keep" statement, SAS literally only keeps the new variable, so I have to reload and remerge the datasets after I do this proc. Is there a way to tell SAS to keep all of the old variables in addition to the new variable?

 

Thanks again - I really appreciate the help!

 

Best

Laura


The default is to keep every variable. If you don't want to keep everything then you can list those to KEEP or those to DROP.

Just remove the KEEP statement.

 

Astounding
PROC Star

As @Tom said, you can get rid of the KEEP statement.  That keeps all variables.  However, note that you are now outputting a separate observation for each diagnosis.  With multiple observations, you may not want to keep all those other variables ... but you do have the choice.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1998 views
  • 0 likes
  • 4 in conversation