Hi All,
Could you please help me the following sql question:
I have 2 datasets as below:
In the first dataset, the test results and dates of patients are listed. In the second dataset, the visit dates and doctor names that patients had visited are listed.
I want to create a third table, in which the name of the most frequent doctor visit (between the test dates) for each patient is listed.
In the example below, patient 1 has visited john and mary on dates before his first test date, but since he has visisted john more than mary, john is listed as the frequentdoc (see tables below)
Patientid | TestDate | TestResult |
1 | 5/12/2011 | Y |
1 | 6/10/2012 | Y |
1 | 7/11/2013 | Y |
2 | 1/8/2009 | N |
2 | 5/7/2010 | Y |
3 | 11/9/2012 | Y |
4 | 8/22/2011 | Y |
5 | 9/15/2009 | N |
5 | 10/11/2010 | N |
6 | 12/8/2014 | N |
7 | 1/2/2009 | N |
7 | 2/3/2010 | Y |
7 | 3/4/2011 | Y |
7 | 4/5/2012 | N |
medical history table
PatientID | Visitdate | DocName |
1 | 5/2/2011 | john |
1 | 4/27/2011 | john |
1 | 6/5/2012 | john |
1 | 7/6/2013 | mary |
1 | 7/6/2013 | mary |
2 | 1/6/2009 | john |
2 | 5/3/2010 | alex |
2 | 4/28/2010 | alex |
2 | 5/6/2010 | alex |
3 | 11/1/2012 | john |
3 | 10/27/2012 | alex |
4 | 8/21/2011 | mary |
5 | 9/12/2009 | john |
5 | 9/11/2009 | alex |
5 | 10/10/2010 | alex |
6 | 11/30/2014 | john |
6 | 11/15/2014 | john |
7 | 12/28/2008 | mary |
7 | 12/22/2008 | mary |
7 | 12/18/2008 | mary |
7 | 1/30/2010 | mary |
7 | 2/25/2011 | alex |
7 | 4/4/2012 |
alex |
expected result:
Patientid | TestDate | TestResult | FrequentDoc |
1 | 5/12/2011 | Y | john |
1 | 6/10/2012 | Y | john |
1 | 7/11/2013 | Y | mary |
2 | 1/8/2009 | N | john |
2 | 5/7/2010 | Y | alex |
3 | 11/9/2012 | Y | alex/john |
4 | 8/22/2011 | Y | mary |
5 | 9/15/2009 | N | alex/john |
5 | 10/11/2010 | N | alex |
6 | 12/8/2014 | N | john |
7 | 1/2/2009 | N | mary |
7 | 2/3/2010 | Y | mary |
7 | 3/4/2011 | Y | alex |
7 | 4/5/2012 | N | alex |
how can I create the expected table (last table) by sql?
I don't think this can be done efficiently with SQL only. Here is a combination of datastep and SQL that should be fairly efficient. I added patientID=8 to represent the test-but-no-visit case.
data PTT;
input Patientid TestDate :mmddyy10. TestResult $;
format testdate yymmdd10.;
datalines;
1 5/12/2011 Y
1 6/10/2012 Y
1 7/11/2013 Y
2 1/8/2009 N
2 5/7/2010 Y
3 11/9/2012 Y
4 8/22/2011 Y
5 9/15/2009 N
5 10/11/2010 N
6 12/8/2014 N
7 1/2/2009 N
7 2/3/2010 Y
7 3/4/2011 Y
7 4/5/2012 N
8 6/6/2015 N
;
data hist;
input PatientID Visitdate :mmddyy10. DocName $;
format visitdate yymmdd10.;
datalines;
1 5/2/2011 john
1 4/27/2011 john
1 6/5/2012 john
1 7/6/2013 mary
1 7/6/2013 mary
2 1/6/2009 john
2 5/3/2010 alex
2 4/28/2010 alex
2 5/6/2010 alex
3 11/1/2012 john
3 10/27/2012 alex
4 8/21/2011 mary
5 9/12/2009 john
5 9/11/2009 alex
5 10/10/2010 alex
6 11/30/2014 john
6 11/15/2014 john
7 12/28/2008 mary
7 12/22/2008 mary
7 12/18/2008 mary
7 1/30/2010 mary
7 2/25/2011 alex
7 4/4/2012 alex
;
data periods;
do until(last.patientID);
set PTT; by patientID;
output;
lastTestDate = testDate;
end;
format lastTestDate yymmdd10.;
run;
proc sql;
create table docCounts as
select
a.patientID,
a.testDate,
a.testResult,
b.docName,
count(b.docName) as docCount
from periods as a left join hist as b
on a.patientID=b.patientID and
a.lastTestDate < b.visitDate and
a.testDate >= b.visitDate
group by a.patientID, a.testDate, a.testResult, b.docName;
create table maxDocCounts as
select *
from docCounts
group by patientId, testDate, docName
having docCount = max(docCount);
quit;
data result;
length frequentDoc $40;
do until(last.testDate);
set maxDocCounts; by patientID testDate;
frequentDoc = catx("/", frequentDoc, docName);
end;
keep patientID testDate testResult frequentDoc;
run;
proc print data=result noobs;
var patientID testDate testResult frequentDoc;
run;
I don't think this can be done efficiently with SQL only. Here is a combination of datastep and SQL that should be fairly efficient. I added patientID=8 to represent the test-but-no-visit case.
data PTT;
input Patientid TestDate :mmddyy10. TestResult $;
format testdate yymmdd10.;
datalines;
1 5/12/2011 Y
1 6/10/2012 Y
1 7/11/2013 Y
2 1/8/2009 N
2 5/7/2010 Y
3 11/9/2012 Y
4 8/22/2011 Y
5 9/15/2009 N
5 10/11/2010 N
6 12/8/2014 N
7 1/2/2009 N
7 2/3/2010 Y
7 3/4/2011 Y
7 4/5/2012 N
8 6/6/2015 N
;
data hist;
input PatientID Visitdate :mmddyy10. DocName $;
format visitdate yymmdd10.;
datalines;
1 5/2/2011 john
1 4/27/2011 john
1 6/5/2012 john
1 7/6/2013 mary
1 7/6/2013 mary
2 1/6/2009 john
2 5/3/2010 alex
2 4/28/2010 alex
2 5/6/2010 alex
3 11/1/2012 john
3 10/27/2012 alex
4 8/21/2011 mary
5 9/12/2009 john
5 9/11/2009 alex
5 10/10/2010 alex
6 11/30/2014 john
6 11/15/2014 john
7 12/28/2008 mary
7 12/22/2008 mary
7 12/18/2008 mary
7 1/30/2010 mary
7 2/25/2011 alex
7 4/4/2012 alex
;
data periods;
do until(last.patientID);
set PTT; by patientID;
output;
lastTestDate = testDate;
end;
format lastTestDate yymmdd10.;
run;
proc sql;
create table docCounts as
select
a.patientID,
a.testDate,
a.testResult,
b.docName,
count(b.docName) as docCount
from periods as a left join hist as b
on a.patientID=b.patientID and
a.lastTestDate < b.visitDate and
a.testDate >= b.visitDate
group by a.patientID, a.testDate, a.testResult, b.docName;
create table maxDocCounts as
select *
from docCounts
group by patientId, testDate, docName
having docCount = max(docCount);
quit;
data result;
length frequentDoc $40;
do until(last.testDate);
set maxDocCounts; by patientID testDate;
frequentDoc = catx("/", frequentDoc, docName);
end;
keep patientID testDate testResult frequentDoc;
run;
proc print data=result noobs;
var patientID testDate testResult frequentDoc;
run;
This works perfectly. Thank you!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.