Hi guys,
suppose to have the following:
data DB;
input Code_1 :$20. Code_2 :$20. Code_3 :$20. Explanation_1 :$20. Explanation_2 :$20. Explanation_3 :$20.;
cards;
89.7 0 89.65 General_Visit 0 Monitor
0 89.61 0 0 Check 0
89.7 89.65 0 General_Visit Pressure 0
0 0 0 0 ECG 0
0 0 91.49.2 0 0 0
.....
;
Is there a way to get the frequency of each Code* with associated Explanation* for the entire data set? The frequency should be reported also if there is only the Code or only the Explanation. Overall the dataset has Codes from *_1 to *_82 and Explanation from *_1 to *_82.
Desired output:
data DB1;
input Code :$20. Explanation :$20. Freq :$20.;
cards;
Code Explanation Freq
89.7 General_Visit 2
89.65 Monitor 1
89.65 Pressure 1
89.61 Check 1
. ECG 1
91.49.2 . 1
run;
Thank you in advance
Your explanation of the problem seems to be silent on the matter of would you ever need to have frequencies of, for example, code_1 with explanation_3.
Assuming you don't need that, this is a brilliant example of a situation where a long data set (which you don't have) makes the solution much simpler than having a wide data set (which you have). By structuring the data properly, you can then write very simple code to get the results you want.
proc freq data=long;
tables code*explanation/list missing;
/* Optional, use out=counts in the above statement after 'missing' to get a SAS data set with the counts */
run;
So how could you get this long data set? The best way is to NOT create the wide data set in the first place, create the long data set in the first place. But sometimes data comes from Excel or other places and it is wide to begin with. So here is code to create the long data set.
data long;
set db;
array c code_1-code_3;
array e explanation_1-explanation_3;
do i=1 to dim(c);
code=c(i);
explanation=e(i);
output;
end;
keep code explanation;
run;
Important concept: almost all SAS data analysis procedures are designed to work with long data sets. Avoid wide data sets, always use long data sets and transform the data to long if necessary.
Your explanation of the problem seems to be silent on the matter of would you ever need to have frequencies of, for example, code_1 with explanation_3.
Assuming you don't need that, this is a brilliant example of a situation where a long data set (which you don't have) makes the solution much simpler than having a wide data set (which you have). By structuring the data properly, you can then write very simple code to get the results you want.
proc freq data=long;
tables code*explanation/list missing;
/* Optional, use out=counts in the above statement after 'missing' to get a SAS data set with the counts */
run;
So how could you get this long data set? The best way is to NOT create the wide data set in the first place, create the long data set in the first place. But sometimes data comes from Excel or other places and it is wide to begin with. So here is code to create the long data set.
data long;
set db;
array c code_1-code_3;
array e explanation_1-explanation_3;
do i=1 to dim(c);
code=c(i);
explanation=e(i);
output;
end;
keep code explanation;
run;
Important concept: almost all SAS data analysis procedures are designed to work with long data sets. Avoid wide data sets, always use long data sets and transform the data to long if necessary.
Nearly 200 sessions are now available on demand in the Innovate Hub.
Watch Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.