I am required to check 5 variables (all diagnosis) to find out the 5 most occuring diagnosis and average length of stay in hospital which is another variable. I tried this for the 1st part ( frequency) and it didnt work,it instead multiplied my observations by 10. how can i fix this please?
DATA WEEK6.QUESTION4;
SET WEEK6.THCICSAMP;
ARRAY DIAG (10) ADMITTING_DIAG PRINC_DIAG_CODE OTH_DIAG_CODE_1 - OTH_DIAG_CODE_8;
DO I = 1 TO 10;
OUTPUT;
END;
PROC FREQ DATA = WEEK6.QUESTION4;
RUN;
You can just makeup some example data to allow you (and us) to work out by hand what you are trying to do.
For example you might just retype a few of the lines you see in the data you have.
data have;
length los 8 Admitting_Diag Princ_Diag Oth_Diag_Code2-Oth_Diag_Code10 $7;
input los -- Oth_Diag_Code10 ;
cards;
7 64681 79579 66331 66411 . . . . . . .
1 v3001 7746 . . . . . . . . .
5 v3001 . . . . . . . . . .
;
Or just make it up from scratch. This is best as then you can create examples that test the boundary conditions of your logic. For example by having the same DX code appear in multiple records.
To count the number of times each DX code appears you could either move them all into one variable and count that.
data tall;
set have;
array dx Admitting_Diag Princ_Diag Oth_Diag_Code2-Oth_Diag_Code10;
do i=1 to dim(dx);
dxcode = dx[i];
if not missing(dxcode) then output;
end;
drop i Admitting_Diag Princ_Diag Oth_Diag_Code2-Oth_Diag_Code10;
run;
proc freq data=tall order=freq ;
tables dxcode / noprint out=counts;
run;
proc print data=counts(obs=5);
run;
Results:
Obs dxcode COUNT PERCENT
1 v3001 2 28.5714
2 64681 1 14.2857
3 66331 1 14.2857
4 66411 1 14.2857
5 7746 1 14.2857
For average LOS you might use PROC MEANS (also known as PROC SUMMARY) this will also generate the frequency.
proc means noprint nway data=tall;
class dxcode ;
var los;
output out=means mean=mean_los;
run;
proc sort data=means;
by descending _freq_ ;
run;
proc print data=means (obs=5);
var dxcode _freq_ mean_los;
run;
Results
Obs dxcode _FREQ_ mean_los 1 v3001 2 3 2 64681 1 7 3 66331 1 7 4 66411 1 7 5 7746 1 1
Please post data in usable form along with the expected result. And please don't write code all upcase, this is hardly readable and post code using "Insert SAS code".
I don't quite understand what you mean by 'in a useable form and 'expected results' . I hope this does it: I am asked to get the most common 5 diagnoses in my variables listed in my array command, and the mean length of stay (another variable) of those diagnoses. They are variables showing diagnosis of patients at different times. so i am thinking to output only those variables i need and get their frequencies. I will then get their means.
What i am getting in my log is :
data week6.question4;
set week6.THCICSamp;
array diag (10) $ admitting_diag princ_diag_code oth_diag_code_1 - oth_diag_code_8;
do i = 1 to 10;
diagnosis = diag (i);
output;
end;
proc freq data = week6.question4 order=freq;
tables _ALL_ / out = week6.ques4freq;
run;
Yes, your observations have been multiplied by 10. Why? Because the OUTPUT statement in the DO loop executes 10 times for each record in your data.
Also, in PROC FREQ, you want to use
tables diagnosis/out=week6.ques4freq;
Data in usable form — here are instructions: https://blogs.sas.com/content/sastraining/2016/03/11/jedi-sas-tricks-data-to-data-step-macro/
In addition to the comments from @andreas_lds (which you should ALWAYS do, that's ALWAYS, every single time, even if we don't ask)
When you use a DO loop with an array, you have to assign a value to a variable, I think what you want inside the do loop is
diagnosis = diag(i);
output;
I understand better now, thanks. It created a new variable called diagnosis for me which I can output. But pardon me, i'm new in SAS and i still dont fully understand the array and do loop concept. Is the diagnosis variable showing the observations common to the variables in my array? thats what i really need, the most frequent observations in the 10 variables in my array.
data week6.question4;
set week6.THCICSamp;
array diag (10) $ admitting_diag princ_diag_code oth_diag_code_1 - oth_diag_code_8;
do i = 1 to 10;
diagnosis = diag (i);
end;
run;
proc freq data = week6.question4 order=freq;
tables diagnosis / out = week6.ques4freq;
run;
@Banke wrote:
I understand better now, thanks. It created a new variable called diagnosis for me which I can output. But pardon me, i'm new in SAS and i still dont fully understand the array and do loop concept. Is the diagnosis variable showing the observations common to the variables in my array? thats what i really need, the most frequent observations in the 10 variables in my array.
I have no idea what you expect as result. It would be much easier to help you if you showed your data and the expected result. In your initial post, you said:
am required to check 5 variables (all diagnosis) to find out the 5 most occurring diagnosis
This is exactly the result of proc freq, the five most occurring diagnosis are the first five lines in the result.
We're trying to help you, but you have to help us. We have asked before, and we ask again (although we should not have to ask twice) — show us a portion of your data (following the instructions I gave) and show us the desired result.
It's nearly impossible to move forward now without this information. Instead of ignoring our requests for information, you will get much faster and better answers by actually providing the information requested.
Why would I ignore requests when I need help? I have posted my questions in the new sas users forum for a reason; I have barely spent 1 month using sas and it is the first time programming tool I will ever use. I am working with a huge dataset so i cant type out where the information are . I have now attached a screenshot of where the variables are. I didnt really understand the information in the link you shared. the snytax and instructions looked so complex. Desired results? I dont have that, it's an assignment. I have mentioned what I am asked to do twice. Maybe if I post the question, it will provide more clarity. Please take it easy with me, your responses are dampening my zeal. I appreciate the help I get from this forum
Question
Using all diagnosis variables (Admitting_Diag, Princ_Diag, Oth_Diag_Code2-Oth_DiagCcode10) recheck the most common five diagnoses; report and update the frequencies and average LOS for those from #3.
You can just makeup some example data to allow you (and us) to work out by hand what you are trying to do.
For example you might just retype a few of the lines you see in the data you have.
data have;
length los 8 Admitting_Diag Princ_Diag Oth_Diag_Code2-Oth_Diag_Code10 $7;
input los -- Oth_Diag_Code10 ;
cards;
7 64681 79579 66331 66411 . . . . . . .
1 v3001 7746 . . . . . . . . .
5 v3001 . . . . . . . . . .
;
Or just make it up from scratch. This is best as then you can create examples that test the boundary conditions of your logic. For example by having the same DX code appear in multiple records.
To count the number of times each DX code appears you could either move them all into one variable and count that.
data tall;
set have;
array dx Admitting_Diag Princ_Diag Oth_Diag_Code2-Oth_Diag_Code10;
do i=1 to dim(dx);
dxcode = dx[i];
if not missing(dxcode) then output;
end;
drop i Admitting_Diag Princ_Diag Oth_Diag_Code2-Oth_Diag_Code10;
run;
proc freq data=tall order=freq ;
tables dxcode / noprint out=counts;
run;
proc print data=counts(obs=5);
run;
Results:
Obs dxcode COUNT PERCENT
1 v3001 2 28.5714
2 64681 1 14.2857
3 66331 1 14.2857
4 66411 1 14.2857
5 7746 1 14.2857
For average LOS you might use PROC MEANS (also known as PROC SUMMARY) this will also generate the frequency.
proc means noprint nway data=tall;
class dxcode ;
var los;
output out=means mean=mean_los;
run;
proc sort data=means;
by descending _freq_ ;
run;
proc print data=means (obs=5);
var dxcode _freq_ mean_los;
run;
Results
Obs dxcode _FREQ_ mean_los 1 v3001 2 3 2 64681 1 7 3 66331 1 7 4 66411 1 7 5 7746 1 1
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.