## Count of Patients with particular set of diagnoses ever

Solved
Occasional Contributor
Posts: 17

# Count of Patients with particular set of diagnoses ever

[ Edited ]

Hi,

I have a very large data set (>3 million observations) for which would like to count the number of patients (not total observations) have one of four possible diagnosis codes. Patients could have multiple observations denoting multiple visits and have up to ten diagnosis variable fields.

My four codes of interest are: 72632 or 72631 or 71922 or 71923.

I would like to create a command that searches DX01 to DX10 for all observations for each patient for presence of the above codes.

Example:

Patientid      dx01    dx02    dx03    dx04    dx05    …       dx10    date

1             72632   72631   71922   71923                          5/5/2009

1              72632   71922   70000   70001                          6/6/2008

2              70001   70000                                                    1/1/2007

2              71900   71901                                                  2/2/2007

3              72632   71900                                                   2/2/2004

3              72631   70000                                                   3/3/2004

3              71922                                                                4/4/2004

From the above example, I would like to produce output like the following

1. By Patient ID

Patient ID            Has (72632 OR 72631 OR 71922 OR 71923) [1=Yes, 2=No]

1             1

2              0

3              1

2. By Total Number of Patients
Total Count of Patients with (72632 OR 72631 OR 71922 OR 71923)

2

Separately, I would also like a count of the number of observations each patient has for the diagnoses (72632 OR 72631 OR 71922 OR 71923) that produces output like:

Patient ID            Has (72632 OR 72631 OR 71922 OR 71923)             Number of Visits Addressing Diagnosis

1             1                                                                                   2

2              0                                                                                   0

3              1                                                                                    2

Accepted Solutions
Solution
‎10-24-2016 10:25 AM
Super User
Posts: 10,280

## Re: Count of Patients with particular set of diagnoses ever

[ Edited ]
``````data inter (keep=id diag);
set have;
array diags {*} dx1-dx10;
diag = 0;
do i = 1 to dim(diags);
if diags{i} in (72632,72631,71922 71923) then diag = 1;
end;
run;

proc sort data=inter;
by id;
run;

data result (keep=id hasdiag visitcount) totalcount (keep=total_patients);
set inter end=done;
retain total_patients 0 visitcount hasdiag;
if first.id
then do;
visitcount = 0;
hasdiag = 0;
end;
if diag then do;
visitcount + 1;
hasdiag = 1;
end;
if last.id then do;
output result;
total_patients + hasdiag;
end;
if done then output totalcount;
run;
``````

If your initial dataset is already sorted by id, you can omit the sort step.

Some edits done for typos.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code

All Replies
Super Contributor
Posts: 250

## Re: Count of Patients with particular set of diagnoses ever

Hi

I am sure there are more efficient ways to handle this, but here's how I would do it using SQL:

``````proc sql;/* This will put the data into categories of Yes = 1 and No = 0 */
create table work.DATA2 as
select patient_id,
case when (dx1 in (72632, 72631, 71922, 71923)
or (dx2 in (72632, 72631, 71922, 71923)
.
.
.
or (dx10 in (72632, 72631, 71922, 71923))
then 1
else 0 end as Yes_No
from work.DATA;
/* This will sum the Yes_No column to give you a TOTAL number of instances that metthe Yes criteria above */
select sum(Yes_No) as Total_Count
from work.DATA2;
/* This will give you the total per patient ID */
select patient_ID, sum(Yes_No) as Pt_Count
from work.DATA2
group by patient_ID
order by patient_ID;
quit;``````

This may require some fine tuning, and there are plenty of papers and articles (including a few written by me!) on PROC SQL, but this should get you started.

Good luck and happy coding!

Chris

Has my article or post helped? Please mark as Solution or Like the article!
Posts: 1,837

## Re: Count of Patients with particular set of diagnoses ever

[ Edited ]

Alternative code using data step:

%let ds_check = 72632 72631 71922 71923;  /* easy to change list if needed */

data want;

set have;

by patient_id;

retain yes_no;

if first.patient_id then yes_no = 0;

array dx dx1-dx10;

do i=1 to 10;

if dx(i) in (&ds_check) then yes_no = 1;

end;

if last.patient_id then output;

keep patient_id  yes_no;

label yes_no = " HAS (&ds_check)";

run;

The TOTAL number can be achieved by various ways

(e.g. by PROC MEANS or PROC SQL or PROC PRINT with SUM )

Occasional Contributor
Posts: 17

## Re: Count of Patients with particular set of diagnoses ever

How do I deal with leading zeros in my diagnoses categories. For example, DX01, DX02, DX03...Dx10?

Thank you so much!

Posts: 1,837

## Re: Count of Patients with particular set of diagnoses ever

[ Edited ]

you may change the array definition into:

ARRAY DX DX01-DX10;

or alternatively do:

SET WANT (Rename=(dx01=dx1 dx02=dx2 ... dx09=dx9));

Super Contributor
Posts: 250

## Re: Count of Patients with particular set of diagnoses ever

Unless I'm misunderstanding, the DX01 is a column header and at least for the SQL version, won't have an impact.  The only time you'd need to worry about leading 0s is if they're in your data (00745454 versus 745454).  Sometimes the number of leading 0s is important, and then you would have to treat the column as TEXT rather than NUMERIC.

hope this helps!

Chris

Has my article or post helped? Please mark as Solution or Like the article!
Super User
Posts: 10,280

## Re: Count of Patients with particular set of diagnoses ever

DMMD wrote:

How do I deal with leading zeros in my diagnoses categories. For example, DX01, DX02, DX03...Dx10?

Thank you so much!

Just include the zero in the variable list. Just tested this short example:

``````data have;
input x01 x02 x03;
cards;
1 2 3
;
run;

data want;
set have;
array xx {*} x01-x03;
do i = 1 to dim(xx);
put xx{i}=;
end;
run;``````
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 17

## Re: Count of Patients with particular set of diagnoses ever

[ Edited ]

Fantastic! The problem was the diagnoses codes were character, not number variables as there are some codes starting with letters. Putting the codes in quotations allowed it to work. Thank you so much! This helped to answer #1 and #3 of my output.

Occasional Contributor
Posts: 17

## Re: Count of Patients with particular set of diagnoses ever

[ Edited ]

Super Contributor
Posts: 250

## Re: Count of Patients with particular set of diagnoses ever

Hey!  I think the issue is the PROC FREQ; if I remember correctly, PROC FREQ counts instances, or something.  At a quick glance, your SQL looks good, and i'm not seeing any issues with your brackets or other logic.  I'll wait and see what @Shmuel says, and if I can i will do some testing tomorrow when I'm at work.

Good luck :-)
Chris

Has my article or post helped? Please mark as Solution or Like the article!
Occasional Contributor
Posts: 17

## Re: Count of Patients with particular set of diagnoses ever

[ Edited ]

Thanks @DarthPathos, @Shmuel, and @KurtBremser! SAS can be a stressful experience to say the least.

Best,

David

Super Contributor
Posts: 250

## Re: Count of Patients with particular set of diagnoses ever

Yeah, I've got to admit i'm stumped LOL. Hopefully one of my esteemed co-users can point you in the right direction, and if I think of anything I'll you know. One of the benefits of being on the community - you get a wide variety of views
Don't give up - I've been using SAS for 6 years, and have come a long, long way. You'll get there!
Chris
Has my article or post helped? Please mark as Solution or Like the article!
Posts: 1,837

## Re: Count of Patients with particular set of diagnoses ever

Within my code ( @shmuel ) I conunt per EUPIDE.

It seems that same eupide can exist in more than one observation.

@KurtBremser counts each observation.

I have no time now to check the 3rd program.

Posts: 1,837

## Re: Count of Patients with particular set of diagnoses ever

[ Edited ]

I have analized the 3 programs , here is the explanation for the differences:

Let's have an input like:

EUPIDE      YES-NO         my remark                      counts by:  SH        KB          DP

1               0                non of dx in list                                                0            1

1               1               one or more dx in list                                       1            1

1               1                                                                           1          1            1

TOTAL                                         1          2            3

in words:

I count each EUPIDE once only. Therefore my total is the least and I have missings in freq output.

@KurtBremser assigns each line as 0 or 1 and sums the colomn yes-no.

@DarthPathos counts each line as 1 if any line of same EUPIDE is 1.

@DMMD now it is your turn to choose which of the 3 fits more to you.

Occasional Contributor
Posts: 17

## Re: Count of Patients with particular set of diagnoses ever

Thanks so much @Shmuel for helping to explain the function of each code.

☑ This topic is solved.