Hello,
I have a dataset that has doctors and the various practices they work in. Each doctor in my dataset works in at least 1 practice but as many as 17 different practices. I would like to know the unique number of doctors each one works with.
Example dataset below. This sample shows that Doctor A is in practices, P1, P3, and P5. Doctor E is in practices P1, P2, and P5,etc.
Doctor | Tot_in_group | grou_practice |
A | 2 | P1 |
E | 2 | P1 |
C | 2 | P2 |
B | 2 | P2 |
A | 3 | P3 |
D | 3 | P3 |
E | 3 | P3 |
E | 2 | P5 |
A | 2 | P5 |
From this chart I would want a new column with the total number of unique doctors each one works with. In this case Doctor A works with 2 other doctors (E & D.) However, if I simply grouped by doctor and summed, I find that Doctor A works with 6 Doctors. However this is wrong because it would count Doctor A 3 times (once for each practice he is listed in) AND it would count Doctor E twice (he is in two group practices with Doctor A, P1 & P5)
I have ~ 800,000 doctors with ~400,000 group practices making manual methods unfeasible. Does any one have any suggestions on how to get this started?
sample data set code
data test;
input doctor $ tot_in_group group_practices $;
datalines;
A 2 P1
E 2 P1
C 3 P2
B 3 P2
E 3 P2
A 2 P3
D 2 P3
E 2 P5
A 2 P5
;
run;
Hi @csessa3 Not sure if I understand your requirement thoroughly. Are you after this by any chance?
data test;
input doctor $ tot_in_group group_practices $;
datalines;
A 2 P1
E 2 P1
C 3 P2
B 3 P2
E 3 P2
A 2 P3
D 2 P3
E 2 P5
A 2 P5
;
run;
proc sql;
create table want as
select a.doctor,count(distinct b.doctor)-1 as count
from test a left join test b
on a.group_practices=b.group_practices
group by a.doctor;
quit;
doctor | count |
---|---|
A | 2 |
B | 2 |
C | 2 |
D | 1 |
E | 3 |
Hello, thank you for taking time to review my question.
The code below is close as it adjusts the count to not double count the individual Doctor (Doctor E is listed in 3 practices so it will subtract 3 for that.) However, does not adjust for when the same doctors are in multiple practices together.
For example, Doctor A only works with 1 other doctor (Doctor E.) It just so happens that Doctor A and Doctor E work in 2 practices together (P1, P2) In contrast, Doctor E works with 2 other doctors, Doctor A and Doctor D.
The final output would look like:
Doctor | Unique Doctors Work with |
A | 1 |
B | 2 |
C | 2 |
D | 1 |
E | 2 |
Can you explain
D | 2 |
or should it be
D | 1 |
?
I'm sorry, that was a typo on my part, I will see if I can edit my previous comment.
Doctor D should be listed as working with 1 other docotor
Hi @csessa3 No worries. I am afraid perhaps not the best sleep last night or whatever I am not proud of the fact my comprehension is slow and slippery today. Again, not stating it is now 100%, but I am thinking the following should be close-
data test;
input doctor $ tot_in_group group_practices $;
datalines;
A 2 P1
E 2 P1
C 3 P2
B 3 P2
E 3 P2
A 2 P3
D 2 P3
E 2 P5
A 2 P5
;
run;
proc sql;
create table want as
select doctor, count(bd) as unique_count
from
(select a.doctor,b.doctor as bd
from test a left join test b
on a.group_practices=b.group_practices and a.doctor ne b.doctor
group by a.doctor,b.doctor
having count(*)=1)
group by doctor;
quit;
doctor | unique_count |
---|---|
A | 1 |
B | 2 |
C | 2 |
D | 1 |
E | 2 |
@csessa3 wrote:
Hello, thank you for taking time to review my question.
The code below is close as it adjusts the count to not double count the individual Doctor (Doctor E is listed in 3 practices so it will subtract 3 for that.) However, does not adjust for when the same doctors are in multiple practices together.
For example, Doctor A only works with 1 other doctor (Doctor E.) It just so happens that Doctor A and Doctor E work in 2 practices together (P1, P2) In contrast, Doctor E works with 2 other doctors, Doctor A and Doctor D.
The final output would look like:
Doctor Unique Doctors Work with A 1 B 2 C 2 D 1 E 2
I see doctor A works in P3 with doctor D as well as E in two practices. So how does A have a 1 for unique doctors. E share practices with A, B and C. I am afraid that I don't quite understand your definition of "unique" or what you are actually counting.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.