Re: Finding unique number of IDs in multiple groups

csessa3 · Posted 06-05-2020 01:19 PM

Hello,

I have a dataset that has doctors and the various practices they work in. Each doctor in my dataset works in at least 1 practice but as many as 17 different practices. I would like to know the unique number of doctors each one works with.

Example dataset below. This sample shows that Doctor A is in practices, P1, P3, and P5. Doctor E is in practices P1, P2, and P5,etc.

Doctor	Tot_in_group	grou_practice
A	2	P1
E	2	P1
C	2	P2
B	2	P2
A	3	P3
D	3	P3
E	3	P3
E	2	P5
A	2	P5

From this chart I would want a new column with the total number of unique doctors each one works with. In this case Doctor A works with 2 other doctors (E & D.) However, if I simply grouped by doctor and summed, I find that Doctor A works with 6 Doctors. However this is wrong because it would count Doctor A 3 times (once for each practice he is listed in) AND it would count Doctor E twice (he is in two group practices with Doctor A, P1 & P5)

I have ~ 800,000 doctors with ~400,000 group practices making manual methods unfeasible. Does any one have any suggestions on how to get this started?

sample data set code

data test;
input doctor $ tot_in_group group_practices $;
datalines;
A 2 P1
E 2 P1
C 3 P2
B 3 P2
E 3 P2
A 2 P3
D 2 P3
E 2 P5
A 2 P5
;
run;

novinosrin · Posted 06-05-2020 01:37 PM

Hi @csessa3 Not sure if I understand your requirement thoroughly. Are you after this by any chance?



data test;
input doctor $ tot_in_group group_practices $;
datalines;
A 2 P1
E 2 P1
C 3 P2
B 3 P2
E 3 P2
A 2 P3
D 2 P3
E 2 P5
A 2 P5
;
run;

proc sql;
create table want as
select a.doctor,count(distinct b.doctor)-1 as count
from test a left join test b
on a.group_practices=b.group_practices
group by a.doctor;
quit;

doctor	count
A	2
B	2
C	2
D	1
E	3

csessa3 · Posted 06-05-2020 01:48 PM

Hello, thank you for taking time to review my question.

The code below is close as it adjusts the count to not double count the individual Doctor (Doctor E is listed in 3 practices so it will subtract 3 for that.) However, does not adjust for when the same doctors are in multiple practices together.

For example, Doctor A only works with 1 other doctor (Doctor E.) It just so happens that Doctor A and Doctor E work in 2 practices together (P1, P2) In contrast, Doctor E works with 2 other doctors, Doctor A and Doctor D.

The final output would look like:

Doctor	Unique Doctors Work with
A	1
B	2
C	2
D	1
E	2

novinosrin · Posted 06-05-2020 02:24 PM

Can you explain

D

2

or should it be

D

1

?

csessa3 · Posted 06-05-2020 02:32 PM

I'm sorry, that was a typo on my part, I will see if I can edit my previous comment.

Doctor D should be listed as working with 1 other docotor

novinosrin · Posted 06-05-2020 02:37 PM

Hi @csessa3 No worries. I am afraid perhaps not the best sleep last night or whatever I am not proud of the fact my comprehension is slow and slippery today. Again, not stating it is now 100%, but I am thinking the following should be close-


data test;
input doctor $ tot_in_group group_practices $;
datalines;
A 2 P1
E 2 P1
C 3 P2
B 3 P2
E 3 P2
A 2 P3
D 2 P3
E 2 P5
A 2 P5
;
run;

proc sql;
create table want as
select doctor, count(bd) as unique_count
from
(select a.doctor,b.doctor as bd
from test a left join test b
on a.group_practices=b.group_practices and a.doctor ne b.doctor
group by a.doctor,b.doctor
having count(*)=1)
group by doctor;
quit;

doctor	unique_count
A	1
B	2
C	2
D	1
E	2

ballardw · Posted 06-05-2020 06:28 PM

@csessa3 wrote:

Hello, thank you for taking time to review my question.

The code below is close as it adjusts the count to not double count the individual Doctor (Doctor E is listed in 3 practices so it will subtract 3 for that.) However, does not adjust for when the same doctors are in multiple practices together.

For example, Doctor A only works with 1 other doctor (Doctor E.) It just so happens that Doctor A and Doctor E work in 2 practices together (P1, P2) In contrast, Doctor E works with 2 other doctors, Doctor A and Doctor D.

The final output would look like:

Doctor Unique Doctors Work with

A 1

B 2

C 2

D 1

E 2

I see doctor A works in P3 with doctor D as well as E in two practices. So how does A have a 1 for unique doctors. E share practices with A, B and C. I am afraid that I don't quite understand your definition of "unique" or what you are actually counting.

Finding unique number of IDs in multiple groups