Hello all,
I am trying to create a new variable in my dataset which counts the frequency of unique diagnoses per ID. I want the frequency variable to count any unique diagnosis - except "stress" or "infection". Below is an example of my dataset and desired output. (Note: in reality, my actual dataset has many different types of diagnoses, not just those shown below, but I still do not want to include "stress" or "infection" in my new frequency variable).
Any help would be much appreciated. Thank you.
Desired output:
ID | visit_num | diag | age | frequency |
1 | 1 | diabetes | 42 | 1 |
2 | 1 | stroke | 78 | 1 |
2 | 2 | stress | ||
2 | 3 | stroke | ||
3 | 1 | infection | 61 | 0 |
3 | 2 |
Data:
data have;
input id visit_num diag$ age;
cards;
01 1 diabetes 42
02 1 stroke 78
02 2 stress .
02 3 stroke .
03 1 infection 61
03 2 . .
;
run;
data have;
input id visit_num diag :$9. age;
cards;
01 1 diabetes 42
02 1 stroke 78
02 2 stress .
02 3 stroke .
03 1 infection 61
03 2 . .
;
run;
proc sort data=have;
by id diag;
run;
proc freq data=have(where=(diag not in ('stress','infection')));
tables id*diag/noprint out=_counts_;
run;
data want;
merge have _counts_(drop=percent rename=(count=frequency));
by id diag;
run;
proc sort data=want;
by id visit_num;
run;
said not to count stress or infection though. should use a where statement to avoid counting those.
data have;
input id visit_num diag :$9. age;
cards;
01 1 diabetes 42
02 1 stroke 78
02 2 stress .
02 3 stroke .
03 1 infection 61
03 2 . .
;
run;
proc sort data=have;
by id diag;
run;
proc freq data=have;
where diag not in ('stress','infection');
tables id*diag/noprint out=_counts_;
run;
data want;
merge have(in=a) _counts_(drop=percent rename=(count=frequency));
by id diag;
if a;
run;
proc sort data=want;
by id visit_num;
run;
@tarheel13 wrote:
said not to count stress or infection though. should use a where statement to avoid counting those.
Already fixed
Thanks @PaigeMiller for your help. I think that almost does what I need it to, but to clarify, I actually want the count of the unique diagnoses per ID, regardless of what the actual diagnosis is, but not counting stress and infection. (E.g., for ID2, the count would be 1, for ID3 the count would be 0).
@monsterpie wrote:
I actually want the count of the unique diagnoses per ID, regardless of what the actual diagnosis is, but not counting stress and infection. (E.g., for ID2, the count would be 1, for ID3 the count would be 0).
This contradicts your stated desired output, where ID2 shows a frequency of 2. So which is it?
My apologies @PaigeMiller. I edited the original post to fix this error, I am looking to count the unique diganoses per ID. So for ID2, the count should be 1.
are you also not counting strokes?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.