Hi All,
Could you please help me the following sql question:
I have 2 datasets as below:
In the first dataset, the test results and dates of patients are listed. In the second dataset, the visit dates and doctor names that patients had visited are listed.
I want to create a third table, in which the name of the most frequent doctor visit (between the test dates) for each patient is listed.
In the example below, patient 1 has visited john and mary on dates before his first test date, but since he has visisted john more than mary, john is listed as the frequentdoc (see tables below)
| Patientid | TestDate | TestResult |
| 1 | 5/12/2011 | Y |
| 1 | 6/10/2012 | Y |
| 1 | 7/11/2013 | Y |
| 2 | 1/8/2009 | N |
| 2 | 5/7/2010 | Y |
| 3 | 11/9/2012 | Y |
| 4 | 8/22/2011 | Y |
| 5 | 9/15/2009 | N |
| 5 | 10/11/2010 | N |
| 6 | 12/8/2014 | N |
| 7 | 1/2/2009 | N |
| 7 | 2/3/2010 | Y |
| 7 | 3/4/2011 | Y |
| 7 | 4/5/2012 | N |
medical history table
| PatientID | Visitdate | DocName |
| 1 | 5/2/2011 | john |
| 1 | 4/27/2011 | john |
| 1 | 6/5/2012 | john |
| 1 | 7/6/2013 | mary |
| 1 | 7/6/2013 | mary |
| 2 | 1/6/2009 | john |
| 2 | 5/3/2010 | alex |
| 2 | 4/28/2010 | alex |
| 2 | 5/6/2010 | alex |
| 3 | 11/1/2012 | john |
| 3 | 10/27/2012 | alex |
| 4 | 8/21/2011 | mary |
| 5 | 9/12/2009 | john |
| 5 | 9/11/2009 | alex |
| 5 | 10/10/2010 | alex |
| 6 | 11/30/2014 | john |
| 6 | 11/15/2014 | john |
| 7 | 12/28/2008 | mary |
| 7 | 12/22/2008 | mary |
| 7 | 12/18/2008 | mary |
| 7 | 1/30/2010 | mary |
| 7 | 2/25/2011 | alex |
| 7 | 4/4/2012 |
alex |
expected result:
| Patientid | TestDate | TestResult | FrequentDoc |
| 1 | 5/12/2011 | Y | john |
| 1 | 6/10/2012 | Y | john |
| 1 | 7/11/2013 | Y | mary |
| 2 | 1/8/2009 | N | john |
| 2 | 5/7/2010 | Y | alex |
| 3 | 11/9/2012 | Y | alex/john |
| 4 | 8/22/2011 | Y | mary |
| 5 | 9/15/2009 | N | alex/john |
| 5 | 10/11/2010 | N | alex |
| 6 | 12/8/2014 | N | john |
| 7 | 1/2/2009 | N | mary |
| 7 | 2/3/2010 | Y | mary |
| 7 | 3/4/2011 | Y | alex |
| 7 | 4/5/2012 | N | alex |
how can I create the expected table (last table) by sql?
I don't think this can be done efficiently with SQL only. Here is a combination of datastep and SQL that should be fairly efficient. I added patientID=8 to represent the test-but-no-visit case.
data PTT;
input Patientid TestDate :mmddyy10. TestResult $;
format testdate yymmdd10.;
datalines;
1 5/12/2011 Y
1 6/10/2012 Y
1 7/11/2013 Y
2 1/8/2009 N
2 5/7/2010 Y
3 11/9/2012 Y
4 8/22/2011 Y
5 9/15/2009 N
5 10/11/2010 N
6 12/8/2014 N
7 1/2/2009 N
7 2/3/2010 Y
7 3/4/2011 Y
7 4/5/2012 N
8 6/6/2015 N
;
data hist;
input PatientID Visitdate :mmddyy10. DocName $;
format visitdate yymmdd10.;
datalines;
1 5/2/2011 john
1 4/27/2011 john
1 6/5/2012 john
1 7/6/2013 mary
1 7/6/2013 mary
2 1/6/2009 john
2 5/3/2010 alex
2 4/28/2010 alex
2 5/6/2010 alex
3 11/1/2012 john
3 10/27/2012 alex
4 8/21/2011 mary
5 9/12/2009 john
5 9/11/2009 alex
5 10/10/2010 alex
6 11/30/2014 john
6 11/15/2014 john
7 12/28/2008 mary
7 12/22/2008 mary
7 12/18/2008 mary
7 1/30/2010 mary
7 2/25/2011 alex
7 4/4/2012 alex
;
data periods;
do until(last.patientID);
set PTT; by patientID;
output;
lastTestDate = testDate;
end;
format lastTestDate yymmdd10.;
run;
proc sql;
create table docCounts as
select
a.patientID,
a.testDate,
a.testResult,
b.docName,
count(b.docName) as docCount
from periods as a left join hist as b
on a.patientID=b.patientID and
a.lastTestDate < b.visitDate and
a.testDate >= b.visitDate
group by a.patientID, a.testDate, a.testResult, b.docName;
create table maxDocCounts as
select *
from docCounts
group by patientId, testDate, docName
having docCount = max(docCount);
quit;
data result;
length frequentDoc $40;
do until(last.testDate);
set maxDocCounts; by patientID testDate;
frequentDoc = catx("/", frequentDoc, docName);
end;
keep patientID testDate testResult frequentDoc;
run;
proc print data=result noobs;
var patientID testDate testResult frequentDoc;
run;
I don't think this can be done efficiently with SQL only. Here is a combination of datastep and SQL that should be fairly efficient. I added patientID=8 to represent the test-but-no-visit case.
data PTT;
input Patientid TestDate :mmddyy10. TestResult $;
format testdate yymmdd10.;
datalines;
1 5/12/2011 Y
1 6/10/2012 Y
1 7/11/2013 Y
2 1/8/2009 N
2 5/7/2010 Y
3 11/9/2012 Y
4 8/22/2011 Y
5 9/15/2009 N
5 10/11/2010 N
6 12/8/2014 N
7 1/2/2009 N
7 2/3/2010 Y
7 3/4/2011 Y
7 4/5/2012 N
8 6/6/2015 N
;
data hist;
input PatientID Visitdate :mmddyy10. DocName $;
format visitdate yymmdd10.;
datalines;
1 5/2/2011 john
1 4/27/2011 john
1 6/5/2012 john
1 7/6/2013 mary
1 7/6/2013 mary
2 1/6/2009 john
2 5/3/2010 alex
2 4/28/2010 alex
2 5/6/2010 alex
3 11/1/2012 john
3 10/27/2012 alex
4 8/21/2011 mary
5 9/12/2009 john
5 9/11/2009 alex
5 10/10/2010 alex
6 11/30/2014 john
6 11/15/2014 john
7 12/28/2008 mary
7 12/22/2008 mary
7 12/18/2008 mary
7 1/30/2010 mary
7 2/25/2011 alex
7 4/4/2012 alex
;
data periods;
do until(last.patientID);
set PTT; by patientID;
output;
lastTestDate = testDate;
end;
format lastTestDate yymmdd10.;
run;
proc sql;
create table docCounts as
select
a.patientID,
a.testDate,
a.testResult,
b.docName,
count(b.docName) as docCount
from periods as a left join hist as b
on a.patientID=b.patientID and
a.lastTestDate < b.visitDate and
a.testDate >= b.visitDate
group by a.patientID, a.testDate, a.testResult, b.docName;
create table maxDocCounts as
select *
from docCounts
group by patientId, testDate, docName
having docCount = max(docCount);
quit;
data result;
length frequentDoc $40;
do until(last.testDate);
set maxDocCounts; by patientID testDate;
frequentDoc = catx("/", frequentDoc, docName);
end;
keep patientID testDate testResult frequentDoc;
run;
proc print data=result noobs;
var patientID testDate testResult frequentDoc;
run;
This works perfectly. Thank you!
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.