Re: Sum frequencies across multiple variables

monsterpie · Posted 08-29-2021 04:00 PM

Hello all,

I am trying to create a new variable in my dataset which counts the frequency of diagnoses per ID. I want the frequency variable to count any diagnosis - except "stress" or missing values. Below is an example of my dataset and desired output.

Any help would be much appreciated. Thank you.

Desired output:

ID	visit_num	diag	age	frequency
1	1	diabetes	42	1
2	1	stroke	78	2
2	2	stress
2	3	stroke
3	1	infection	61	1
3	2

Data:

data have;
input id visit_num diag$ age;
cards;
01 1 diabetes 42 
02 1 stroke 78 
02 2 stress . 
02 3 stroke . 
03 1 infection 61 
03 2 . . 

;
run;

Tom · Posted 08-29-2021 04:16 PM

Please explain more about what output you want.

If the goal is to summarize each of the multiple diagnosis variables separately then that is simple.

proc freq data=wide;
  tables diag: ;
run;

But if you want to create some type of combined summary that ignores the order that diagnosis appears. For example that counts the number of times that "stroke" appears independent of which variable it appears in then you will essentially need to get the data into a single variable.

You could possibly create a bunch of binary variables on each observation and then summarize those, but that is much harder to expand to other diagnoses.

data really_wide;
   set wide ;
   cancer = 0 < whichc('cancer', of diag:);
   stroke = 0 < whichc('stroke ', of diag:);
   ...
run;
proc means sum mean min max  data=really_wide;
  var cancer stroke ....;
run;

The SUM of a binary variable is number of times it is true. The mean is the percent true. The min is whether it is ever false The max is whether it is ever true.

monsterpie · Posted 09-01-2021 11:18 AM

Thanks for your response @Tom. I just updated my question to be much more clear about my objectives, which is closer to the latter of what you suggested. However, I am just trying to get one variable that counts the number of diagnoses per ID. Thank you for your help.

Sum frequencies across multiple variables

Re: Sum frequencies across multiple variables

Re: Sum frequencies across multiple variables

SAS Innovate 2025: Save the Date