DATA Step, Macro, Functions and more

questions about array coding

Accepted Solution Solved
Reply
Contributor
Posts: 44
Accepted Solution

questions about array coding

Hello,

 

I am using the full version of SAS 9.4 via a virtual desktop. I had two questions about coding an array. My data is in character format.

 

1. Each row is a different case and there are 3 columns of comorbidities. Patients can have 0-3 comorbidities but one person cannot have a repeated diagnosis.  I am interested to know the overall most frequent diagnosis regardless of whether it is in position 1, 2 or 3. Is there a special name for an array like this?

 

2. Second, I would like to know the average number of comorbidities per person from the above data. How do I code this such that SAS counts and assigns a number of co-morbidities to each patient?  Then I can do proc means to get the mean number.

 

Thank you in advance for the help!

 

Best

Laura 

 

 


Accepted Solutions
Solution
‎11-27-2017 10:57 AM
Super User
Posts: 6,542

Re: questions about array coding

Yes, for 34 an array would be appropriate.  For example:

 

data want;

set have;

array comorbs {34} code1-code34;   

n_comorbs=0;

do _n_=1 to 34;

   if comorbs{_n_} > ' ' then n_comorbs + 1;

end;

run;

View solution in original post


All Replies
Respected Advisor
Posts: 4,549

Re: questions about array coding

@lmyers2

It's a bit difficult to give an answer without you providing sample data. From what you describe I'd assume that you best transpose your data to a long structure by creating a row per diagnosis. After this it's a simple count, freq, rank etc.

Contributor
Posts: 44

Re: questions about array coding

Here is some sample data: 

 

                        Comorbidity1                         Comorbidity2  Comorbidity 3

Patient 1         Diabetes                                  Heart failure   Chronic renal failure

Patient 2         Chronic renal failure               Diabetes          .

Patient 3         Diabetes                                  .                       .

 

For the first question, what I want is a table with descending frequencies of comorbidities:

            Diabetes                      3 (most frequent)

            Chronic renal failure    2

            Heart failure                1

 

For the second question, what I want is a count number for each patient:

            Patient 1         3 comorbidities

            Patient 2         2 comorbidities

            Patient 3         1 comorbidity

Super User
Posts: 6,542

Re: questions about array coding

[ Edited ]

The second question is actually relatively easy.  If your variables are character:

 

data want;

set have;

n_comorbs = (comorbid1  > ' ') + (cormorbid2 > ' ') + (cormorbid3 > ' ');

run;

 

For the first question, it's probably faster to process each variable separately, then combine the results.  For example:

 

proc freq data=have noprint;

tables comorbid1 / out=tot1 (rename=(comorbid1 = diagnosis));

tables comorbid2 / out=tot2 (rename=(comorbid1 = diagnosis));

tables comorbid3 / out=tot3 (rename=(comorbid1 = diagnosis));

run;

 

data want;

set comorbid1 comorbid2 comorbid3;

where diagnosis > ' ';

run;

 

proc means data=want sum;

var count;

class diagnosis;

run;

 

It's possible to do this with 2 steps instead of 3, but the processing would take longer because both steps would work with a larger data set.  If your data set is small to begin with, and you want to see alternatives, they definitely exist.

Contributor
Posts: 44

Re: questions about array coding

Posted in reply to Astounding

Thanks so much - if I used the following code you sent and there are up to 34 comorbidities, is there a quick way to code 1-34 without listing them all?   Would I then have to use an array?

 

Thanks

Laura

 

YOUR ORIGINAL CODE:

data want; set have; n_comorbs = (comorbid1 > ' ') + (cormorbid2 > ' ') + (cormorbid3 > ' '); run;

 

Solution
‎11-27-2017 10:57 AM
Super User
Posts: 6,542

Re: questions about array coding

Yes, for 34 an array would be appropriate.  For example:

 

data want;

set have;

array comorbs {34} code1-code34;   

n_comorbs=0;

do _n_=1 to 34;

   if comorbs{_n_} > ' ' then n_comorbs + 1;

end;

run;

Contributor
Posts: 44

Re: questions about array coding

Posted in reply to Astounding

Thank you - that coding worked!

 

Now if I want to find the most frequent comorbidity among the 34 columns, I should probably do an array as well but have to retain the characters instead of counting between apostrophes ' '.   I think you had suggested doing each variable individually via proc freq but that was when there were only 3 columns. 

Super User
Posts: 6,542

Re: questions about array coding

Yes, with 34 potential fields to examine, that would look different.  The WANT data set here has to be separate from the WANT data set for the other question, because this one contains multiple observations for each original observation:

 

data diagnoses;

set have;

array comorbs {34} code1-code34;   

do _n_=1 to 34;

   if comorbs{_n_} > ' ' then do;

      diagnosis = comorbs{_n_};

      output;

   end;

end;

keep diagnosis;

run;

 

proc freq data=diagnoses;

tables diagnosis;

run;

 

Consider adding the ORDER=FREQ option to print the table by descending frequency count instead of alphabetically.

Contributor
Posts: 44

Re: questions about array coding

Posted in reply to Astounding

Thank you, this works perfectly!

 

In response to this "keep" statement, SAS literally only keeps the new variable, so I have to reload and remerge the datasets after I do this proc. Is there a way to tell SAS to keep all of the old variables in addition to the new variable?

 

Thanks again - I really appreciate the help!

 

Best

Laura

Super User
Super User
Posts: 7,858

Re: questions about array coding


lmyers2 wrote:

Thank you, this works perfectly!

 

In response to this "keep" statement, SAS literally only keeps the new variable, so I have to reload and remerge the datasets after I do this proc. Is there a way to tell SAS to keep all of the old variables in addition to the new variable?

 

Thanks again - I really appreciate the help!

 

Best

Laura


The default is to keep every variable. If you don't want to keep everything then you can list those to KEEP or those to DROP.

Just remove the KEEP statement.

 

Super User
Posts: 6,542

Re: questions about array coding

As @Tom said, you can get rid of the KEEP statement.  That keeps all variables.  However, note that you are now outputting a separate observation for each diagnosis.  With multiple observations, you may not want to keep all those other variables ... but you do have the choice.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 252 views
  • 0 likes
  • 4 in conversation