Hello,
I have a dataset with patient_ID and patients' diagnosis codes. If a patient had more than one diagnosis, they are in multiple rows. In another words, the dataset is not unique by patient_ID.
I would like to create a counter, which is the maximum number of diagnosis a patient can have.
Can somebody tell me, whether the codes below are creating this counter?
proc sql noprint;
select left(trim(put(MAX(count),5.))) into: Idy from
(select count(*) as count
from data
group by patient_id);
QUIT;
%PUT &Idy;
Thank you very much.
Best.
Yanyan
I would change the code to something like
proc sql noprint; select MAX(count) into: Idy trimmed from (select count(distinct diagnosis) as count from data group by patient_id); QUIT; %PUT &Idy;
The first change is just simplifying your select expression, and using the TRIMMED keyword to trim leading and trailing blanks from the macro variable. The second is using count(distinct diagnosis) instead of count(*), which will solve two problems:
Of course, if you are absolutely sure that you only have unique, non-missing diagnoses, your count(*) will work just as well.
"Can somebody tell me, whether the codes below are creating this counter? " - why not just run the code and see?
No. Nobody can tell you, whether the codes below are creating this counter, because we can not see your data.
Provide some sample of what your data looks like. It makes it much easier to help 🙂
@YGong wrote:
Hello,
I have a dataset with patient_ID and patients' diagnosis codes. If a patient had more than one diagnosis, they are in multiple rows. In another words, the dataset is not unique by patient_ID.
I would like to create a counter, which is the maximum number of diagnosis a patient can have.
Yanyan
Code reading your data can at best provide the maximum that have occurred. It will never actually provide the number of diagnosis that patient may have.
Example: The day after you run the data and get X as the maximum count the patient with that count is diagnosed with something.
I would change the code to something like
proc sql noprint; select MAX(count) into: Idy trimmed from (select count(distinct diagnosis) as count from data group by patient_id); QUIT; %PUT &Idy;
The first change is just simplifying your select expression, and using the TRIMMED keyword to trim leading and trailing blanks from the macro variable. The second is using count(distinct diagnosis) instead of count(*), which will solve two problems:
Of course, if you are absolutely sure that you only have unique, non-missing diagnoses, your count(*) will work just as well.
Great, thank you so much!!
This is the answer I was looking for. Really appreciated your reply.
Best regards from Paris.
Yanyan
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.