Hello,
I have a dataset that shows patient visits to different doctors on different dates as well as the dates in which they went under surgery.
A patient is considered as a link between 2 doctors if the patient has visited both doctors. The doctor whom the patient has visited earlier is considered as referring doctor while the doctor whom the patient has visited later is considered as referred doctor. I want to count the number of unique patients that each doctor has referred to others as well as the number of patients that the doctor has received from others up until the date that a patient has gone under surgery. Here is how my dataset looks like, I want to fill in the last two columns (SentPatient and ReceivedPatients)
For example: Patient 1 is considered as a link between doctor A and B. Since the visit date to doctor A is earlier than B, then it is considered that doctor A has sent patient 1 to doctor B. Similarly, doctor A has also sent patient 2 to doctor B. Since both of these referrals has happened earlier than corresponding surgery dates, then SentPatients is calculated as 2 for doctor A in first row.
Patientid | Doc | VisistDate | surgury date | SentPatients | ReceivedPatients |
1 | A | 1/1/2010 | 8/1/2011 | 5 | 4 |
1 | A | 2/1/2010 | 7/1/2011 | 5 | 4 |
1 | B | 3/1/2010 | 6/1/2011 | 3 | 3 |
2 | A | 4/1/2010 | 5/1/2011 | 5 | 4 |
2 | B | 5/1/2010 | 4/1/2011 | 3 | 3 |
3 | C | 6/1/2010 | 3/1/2011 | 3 | 4 |
4 | C | 7/1/2010 | 2/1/2011 | 3 | 4 |
4 | D | 8/1/2010 | 1/1/2011 | 2 | 1 |
4 | E | 9/1/2010 | 11/1/2010 | 1 | 2 |
4 | A | 10/1/2010 | 10/1/2014 | 5 | 4 |
5 | B | 11/1/2010 | 10/2/2014 | 3 | 3 |
5 | C | 12/1/2010 | 10/3/2014 | 3 | 4 |
5 | A | 1/1/2011 | 10/4/2014 | 5 | 4 |
6 | D | 2/1/2011 | 10/5/2014 | 2 | 1 |
7 | B | 3/1/2011 | 10/6/2014 | 3 | 3 |
7 | C | 4/1/2011 | 10/7/2014 | 3 | 4 |
8 | A | 5/1/2011 | 10/8/2014 | 5 | 4 |
9 | B | 6/1/2011 | 10/9/2014 | 3 | 3 |
9 | C | 7/1/2011 | 10/10/2014 | 3 | 4 |
10 | D | 8/1/2011 | 10/11/2014 | 2 | 1 |
Well, you haven't provided any required output, so its hard to give anything other than thoughts.
To get links between doctors, I would, assuming its sorted by patientid visitdate (and noting visitdate and surgery are spelt incorrectly in your code):
data have; Patientid=1; doc="A"; visitdate="01JAN2010"d; output; Patientid=1; doc="A"; visitdate="02JAN2010"d; output; Patientid=1; doc="B"; visitdate="03JAN2010"d; output; Patientid=2; doc="A"; visitdate="04JAN2010"d; output; Patientid=2; doc="B"; visitdate="05JAN2010"d; output; format visitdate date9.; run; data want (drop=lst_doc); set have; length move_to $200; by patientid; retain move_to lst_doc; if first.patientid then do; move_to=doc; lst_doc=doc; end; else if lst_doc ne doc then move_to=cats(move_to,"-",doc); run;
Hi mate,
I thing this would work:
data test;
infile datalines4 dlm=" ";
length Patientid 3 Doc $2 VisistDate $15 surgury_date $15 SentPatients 3 ReceivedPatients 3;
input Patientid Doc VisistDate surgury_date SentPatients ReceivedPatients;
datalines4;
1 A 01/01/2010 08/01/2011 2 3
1 A 02/01/2010 07/01/2011 2 3
1 B 03/01/2010 06/01/2011 2 2
2 A 04/01/2010 05/01/2011 2 3
2 B 05/01/2010 04/01/2011 2 2
3 C 06/01/2010 03/01/2011 3 2
4 C 07/01/2010 02/01/2011 3 2
4 D 08/01/2010 01/01/2011 2 1
4 E 09/01/2010 11/01/2010 1 2
4 A 10/01/2010 10/01/2014 2 3
5 B 11/01/2010 10/02/2014 2 2
5 C 12/01/2010 10/03/2014 3 2
5 A 01/01/2011 10/04/2014 2 3
6 D 02/01/2011 10/05/2014 2 1
7 B 03/01/2011 10/06/2014 2 2
7 C 04/01/2011 10/07/2014 3 2
8 A 05/01/2011 10/08/2014 2 3
9 B 06/01/2011 10/09/2014 2 2
9 C 07/01/2011 10/10/2014 3 2
10 D 08/01/2011 10/11/2014 2 1
;;;;;
run;
data test2;
length Patientid 3 Doc $2 VisistDate 8 surgury_date 8
SentPatients 3 ReceivedPatients 3;
set test(rename=(VisistDate=fdate surgury_date=sdate));
VisistDate = input(fdate,ddmmyy10.);
surgury_date = input(sdate,ddmmyy10.);
format VisistDate surgury_date ddmmyy10.;
drop fdate sdate;
run;
proc sql;
select *,
count(Patientid) as qdt_Patientid
from test2
where VisistDate <= surgury_date
group by doc;
quit;
First i created your data set and the last step is what you need.
Att
At the glance, it seems that you have problem that should be solved using graphs. This particular problem might be solved by using SQL, but if you wish to query several relations in the same query, it gets complicated.
So it could be wise to investigate what SAS/OR offers.
Another option is to use a native graph database, such as Neo4j.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.