BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Banke
Pyrite | Level 9

I am required to check 5 variables (all diagnosis) to find out the 5 most occuring diagnosis and average length of stay in hospital which is another variable. I tried this for the 1st part ( frequency) and it didnt work,it instead multiplied my observations by 10. how can i fix this please?

 

DATA WEEK6.QUESTION4;
SET WEEK6.THCICSAMP;
ARRAY DIAG (10) ADMITTING_DIAG PRINC_DIAG_CODE OTH_DIAG_CODE_1 - OTH_DIAG_CODE_8;
DO I = 1 TO 10;
OUTPUT;
END;
PROC FREQ DATA = WEEK6.QUESTION4;
RUN;

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You can just makeup some example data to allow you (and us) to work out by hand what you are trying to do.

For example you might just retype a few of the lines you see in the data you have. 

 

data have;
  length los 8 Admitting_Diag  Princ_Diag  Oth_Diag_Code2-Oth_Diag_Code10 $7;
  input los -- Oth_Diag_Code10 ;
cards;
7 64681 79579 66331 66411 . . . . . . .
1 v3001 7746      .     . . . . . . . .
5 v3001    .      .     . . . . . . . .
;

Or just make it up from scratch. This is best as then you can create examples that test the boundary conditions of your logic.  For example by having the same DX code appear in multiple records.

 

To count the number of times each DX code appears you could either move them all into one variable and count that.

data tall;
  set have;
  array dx Admitting_Diag  Princ_Diag  Oth_Diag_Code2-Oth_Diag_Code10;
  do i=1 to dim(dx);
     dxcode = dx[i];
     if not missing(dxcode) then output;
  end;
  drop i Admitting_Diag  Princ_Diag  Oth_Diag_Code2-Oth_Diag_Code10;
run;
proc freq data=tall order=freq ;
   tables dxcode / noprint out=counts;
run;
proc print data=counts(obs=5);
run;

Results:

Obs    dxcode    COUNT    PERCENT

 1     v3001       2      28.5714
 2     64681       1      14.2857
 3     66331       1      14.2857
 4     66411       1      14.2857
 5     7746        1      14.2857

For average LOS you might use PROC MEANS (also known as PROC SUMMARY) this will also generate the frequency.

proc means noprint nway data=tall;
  class dxcode ;
  var los;
  output out=means mean=mean_los;
run;
proc sort data=means;
  by descending _freq_ ;
run;
proc print data=means (obs=5);
  var dxcode _freq_ mean_los;
run;

Results

Obs    dxcode    _FREQ_    mean_los

 1     v3001        2          3
 2     64681        1          7
 3     66331        1          7
 4     66411        1          7
 5     7746         1          1

 

 

 

View solution in original post

10 REPLIES 10
andreas_lds
Jade | Level 19

Please post data in usable form along with the expected result. And please don't write code all upcase, this is hardly readable and post code using "Insert SAS code".

 

Banke
Pyrite | Level 9

I don't quite understand what you mean by 'in a useable form and 'expected results' . I hope this does it:  I am asked to get the most common 5 diagnoses in my variables listed in my array command, and the mean length of stay (another variable) of those diagnoses. They are variables showing  diagnosis of patients at different times. so i am thinking to output only those variables i need and get their frequencies. I will then get their means. 

What i am getting in my log is : 

There were 9056 observations read from the data set WEEK6.THCICSAMP.
NOTE: The data set WEEK6.QUESTION4 has 90560 observations and 108 variables.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.06 seconds
my observations have been multiplied by 10

 

data week6.question4;
set week6.THCICSamp;
array diag (10) $ admitting_diag princ_diag_code oth_diag_code_1 - oth_diag_code_8;
do i = 1 to 10;
diagnosis = diag (i);
output;
end;
proc freq data = week6.question4 order=freq;
tables _ALL_ / out = week6.ques4freq;
run;

 

 

PaigeMiller
Diamond | Level 26

Yes, your observations have been multiplied by 10. Why? Because the OUTPUT statement in the DO loop executes 10 times for each record in your data.

 

Also, in PROC FREQ, you want to use

 

tables diagnosis/out=week6.ques4freq;

Data in usable form — here are instructions: https://blogs.sas.com/content/sastraining/2016/03/11/jedi-sas-tricks-data-to-data-step-macro/

--
Paige Miller
PaigeMiller
Diamond | Level 26

In addition to the comments from @andreas_lds (which you should ALWAYS do, that's ALWAYS, every single time, even if we don't ask)

 

When you use a DO loop with an array, you have to assign a value to a variable, I think what you want inside the do loop is

 

diagnosis = diag(i);
output;
--
Paige Miller
Banke
Pyrite | Level 9

I understand better now, thanks. It created a new variable called diagnosis for me which I can output. But pardon me, i'm new in SAS and i still dont fully understand the array and do loop concept. Is the diagnosis variable showing the observations common to the variables in my array? thats what i really need, the most frequent observations in the 10 variables in my array.

data week6.question4;
set week6.THCICSamp;
array diag (10) $ admitting_diag princ_diag_code oth_diag_code_1 - oth_diag_code_8;
do i = 1 to 10;
diagnosis = diag (i);
end;
run;
proc freq data = week6.question4 order=freq;
tables diagnosis / out = week6.ques4freq;
run;

the observations in diagnosis are codes for disease types , present in the different variables in my arraythe observations in diagnosis are codes for disease types , present in the different variables in my array

andreas_lds
Jade | Level 19

@Banke wrote:

I understand better now, thanks. It created a new variable called diagnosis for me which I can output. But pardon me, i'm new in SAS and i still dont fully understand the array and do loop concept. Is the diagnosis variable showing the observations common to the variables in my array? thats what i really need, the most frequent observations in the 10 variables in my array.

 


I have no idea what you expect as result. It would be much easier to help you if you showed your data and the expected result. In your initial post, you said:

am required to check 5 variables (all diagnosis) to find out the 5 most occurring diagnosis

This is exactly the result of proc freq, the five most occurring diagnosis are the first five lines in the result.

PaigeMiller
Diamond | Level 26

We're trying to help you, but you have to help us. We have asked before, and we ask again (although we should not have to ask twice) — show us a portion of your data (following the instructions I gave) and show us the desired result.

 

It's nearly impossible to move forward now without this information. Instead of ignoring our requests for information, you will get much faster and better answers by actually providing the information requested.

--
Paige Miller
Banke
Pyrite | Level 9

 

 

 

thcic_id is patient no, provider name is hospital they are attendingthcic_id is patient no, provider name is hospital they are attendingthe diagnosis (there are different types , from admitting_diag to oth_diag_code_8the diagnosis (there are different types , from admitting_diag to oth_diag_code_8Why would I ignore requests when I need help? I have posted my questions in the new sas users forum for a reason; I have barely spent 1 month using sas and it is the first time  programming tool I will ever use. I am working with a huge dataset so i cant type out where the information are . I have now attached a screenshot of where the variables are. I didnt really understand the information in the link you shared. the snytax and instructions looked so complex. Desired results? I dont have that, it's an assignment.  I have mentioned what I am asked to do twice. Maybe if I post the question, it will provide more clarity. Please take it easy with me, your responses are dampening my zeal. I appreciate the help I get from this forum

 

Question

Using all diagnosis variables (Admitting_Diag, Princ_Diag, Oth_Diag_Code2-Oth_DiagCcode10) recheck the most common five diagnoses; report and update the frequencies and average LOS for those from #3.

Tom
Super User Tom
Super User

You can just makeup some example data to allow you (and us) to work out by hand what you are trying to do.

For example you might just retype a few of the lines you see in the data you have. 

 

data have;
  length los 8 Admitting_Diag  Princ_Diag  Oth_Diag_Code2-Oth_Diag_Code10 $7;
  input los -- Oth_Diag_Code10 ;
cards;
7 64681 79579 66331 66411 . . . . . . .
1 v3001 7746      .     . . . . . . . .
5 v3001    .      .     . . . . . . . .
;

Or just make it up from scratch. This is best as then you can create examples that test the boundary conditions of your logic.  For example by having the same DX code appear in multiple records.

 

To count the number of times each DX code appears you could either move them all into one variable and count that.

data tall;
  set have;
  array dx Admitting_Diag  Princ_Diag  Oth_Diag_Code2-Oth_Diag_Code10;
  do i=1 to dim(dx);
     dxcode = dx[i];
     if not missing(dxcode) then output;
  end;
  drop i Admitting_Diag  Princ_Diag  Oth_Diag_Code2-Oth_Diag_Code10;
run;
proc freq data=tall order=freq ;
   tables dxcode / noprint out=counts;
run;
proc print data=counts(obs=5);
run;

Results:

Obs    dxcode    COUNT    PERCENT

 1     v3001       2      28.5714
 2     64681       1      14.2857
 3     66331       1      14.2857
 4     66411       1      14.2857
 5     7746        1      14.2857

For average LOS you might use PROC MEANS (also known as PROC SUMMARY) this will also generate the frequency.

proc means noprint nway data=tall;
  class dxcode ;
  var los;
  output out=means mean=mean_los;
run;
proc sort data=means;
  by descending _freq_ ;
run;
proc print data=means (obs=5);
  var dxcode _freq_ mean_los;
run;

Results

Obs    dxcode    _FREQ_    mean_los

 1     v3001        2          3
 2     64681        1          7
 3     66331        1          7
 4     66411        1          7
 5     7746         1          1

 

 

 

Banke
Pyrite | Level 9
Thank you so much, I understand better now. I'm sorry for making it difficult to help me. I honestly could not interpret the terminologies correctly.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2537 views
  • 1 like
  • 4 in conversation