Solved: Get the frequency of paired variables over a huge number of variables

NewUsrStat · Posted 05-09-2024 06:49 AM

Hi guys,

suppose to have the following:

data DB;
  input Code_1 :$20. Code_2 :$20. Code_3 :$20. Explanation_1 :$20. Explanation_2 :$20. Explanation_3 :$20.;
cards;
89.7     0     89.65    General_Visit       0             Monitor 
  0    89.61     0          0             Check              0
89.7   89.65     0    General_Visit      Pressure            0
  0     0        0          0              ECG               0
  0     0     91.49.2       0               0                0
.....
;

Is there a way to get the frequency of each Code* with associated Explanation* for the entire data set? The frequency should be reported also if there is only the Code or only the Explanation. Overall the dataset has Codes from *_1 to *_82 and Explanation from *_1 to *_82.

Desired output:

data DB1;
  input Code :$20. Explanation :$20. Freq :$20.;
  cards;
 Code   Explanation   Freq
 89.7   General_Visit  2
 89.65     Monitor     1
 89.65     Pressure    1
 89.61     Check       1
   .        ECG        1
91.49.2      .         1
run;

Thank you in advance

PaigeMiller · Posted 05-09-2024 07:09 AM

Your explanation of the problem seems to be silent on the matter of would you ever need to have frequencies of, for example, code_1 with explanation_3.

Assuming you don't need that, this is a brilliant example of a situation where a long data set (which you don't have) makes the solution much simpler than having a wide data set (which you have). By structuring the data properly, you can then write very simple code to get the results you want.

proc freq data=long;
     tables code*explanation/list missing;
     /* Optional, use out=counts in the above statement after 'missing' to get a SAS data set with the counts */
run;

So how could you get this long data set? The best way is to NOT create the wide data set in the first place, create the long data set in the first place. But sometimes data comes from Excel or other places and it is wide to begin with. So here is code to create the long data set.

data long;
    set db;
    array c code_1-code_3;
    array e explanation_1-explanation_3;
    do i=1 to dim(c);
        code=c(i);
        explanation=e(i);
        output;
    end;
    keep code explanation;
run;

Important concept: almost all SAS data analysis procedures are designed to work with long data sets. Avoid wide data sets, always use long data sets and transform the data to long if necessary.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 05-09-2024 07:09 AM

Your explanation of the problem seems to be silent on the matter of would you ever need to have frequencies of, for example, code_1 with explanation_3.

Assuming you don't need that, this is a brilliant example of a situation where a long data set (which you don't have) makes the solution much simpler than having a wide data set (which you have). By structuring the data properly, you can then write very simple code to get the results you want.

proc freq data=long;
     tables code*explanation/list missing;
     /* Optional, use out=counts in the above statement after 'missing' to get a SAS data set with the counts */
run;

So how could you get this long data set? The best way is to NOT create the wide data set in the first place, create the long data set in the first place. But sometimes data comes from Excel or other places and it is wide to begin with. So here is code to create the long data set.

data long;
    set db;
    array c code_1-code_3;
    array e explanation_1-explanation_3;
    do i=1 to dim(c);
        code=c(i);
        explanation=e(i);
        output;
    end;
    keep code explanation;
run;

Important concept: almost all SAS data analysis procedures are designed to work with long data sets. Avoid wide data sets, always use long data sets and transform the data to long if necessary.

--
Paige Miller

NewUsrStat · Posted 05-09-2024 09:31 AM

Really, thank you very much for helping me with also explanation!

Get the frequency of paired variables over a huge number of variables

Re: Get the frequency of paired variables over a huge number of variables

Re: Get the frequency of paired variables over a huge number of variables

Re: Get the frequency of paired variables over a huge number of variables

Catch up on SAS Innovate 2026