BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
niam
Quartz | Level 8

Hi All,

Could you please help me the following sql question: 

I have 2 datasets as below:

In the first dataset, the test results  and dates of patients are listed. In the second dataset, the visit dates and doctor names that patients had visited are listed. 

I want to create a third table, in which the name of the most frequent doctor visit (between the test dates) for each patient is listed.

In the example below, patient 1 has visited john and mary on dates before his first test date, but since he has visisted john more than mary, john is listed as the frequentdoc (see tables below)

Patientid TestDate TestResult
1 5/12/2011 Y
1 6/10/2012 Y
1 7/11/2013 Y
2 1/8/2009 N
2 5/7/2010 Y
3 11/9/2012 Y
4 8/22/2011 Y
5 9/15/2009 N
5 10/11/2010 N
6 12/8/2014 N
7 1/2/2009 N
7 2/3/2010 Y
7 3/4/2011 Y
7 4/5/2012 N

 

medical history table

PatientID  Visitdate DocName
1 5/2/2011 john
1 4/27/2011 john
1 6/5/2012 john
1 7/6/2013 mary
1 7/6/2013 mary
2 1/6/2009 john
2 5/3/2010 alex
2 4/28/2010 alex
2 5/6/2010 alex
3 11/1/2012 john
3 10/27/2012 alex
4 8/21/2011 mary
5 9/12/2009 john
5 9/11/2009 alex
5 10/10/2010 alex
6 11/30/2014 john
6 11/15/2014 john
7 12/28/2008 mary
7 12/22/2008 mary
7 12/18/2008 mary
7 1/30/2010 mary
7 2/25/2011 alex
7 4/4/2012

alex

expected result:

Patientid TestDate TestResult FrequentDoc
1 5/12/2011 Y john
1 6/10/2012 Y john
1 7/11/2013 Y mary
2 1/8/2009 N john
2 5/7/2010 Y alex
3 11/9/2012 Y alex/john
4 8/22/2011 Y mary
5 9/15/2009 N alex/john
5 10/11/2010 N alex
6 12/8/2014 N john
7 1/2/2009 N mary
7 2/3/2010 Y mary
7 3/4/2011 Y alex
7 4/5/2012 N alex

 

how can I create the expected table (last table) by sql?

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

I don't think this can be done efficiently with SQL only. Here is a combination of datastep and SQL that should be fairly efficient. I added patientID=8 to represent the test-but-no-visit case.

 

data PTT;
input Patientid	TestDate :mmddyy10. TestResult $;
format testdate yymmdd10.;
datalines;
1	5/12/2011	Y
1	6/10/2012	Y
1	7/11/2013	Y
2	1/8/2009	N
2	5/7/2010	Y
3	11/9/2012	Y
4	8/22/2011	Y
5	9/15/2009	N
5	10/11/2010	N
6	12/8/2014	N
7	1/2/2009	N
7	2/3/2010	Y
7	3/4/2011	Y
7	4/5/2012	N
8   6/6/2015    N
;

data hist;
input PatientID  Visitdate :mmddyy10. DocName $;
format visitdate yymmdd10.;
datalines;
1	5/2/2011	john
1	4/27/2011	john
1	6/5/2012	john
1	7/6/2013	mary
1	7/6/2013	mary
2	1/6/2009	john
2	5/3/2010	alex
2	4/28/2010	alex
2	5/6/2010	alex
3	11/1/2012	john
3	10/27/2012	alex
4	8/21/2011	mary
5	9/12/2009	john
5	9/11/2009	alex
5	10/10/2010	alex
6	11/30/2014	john
6	11/15/2014	john
7	12/28/2008	mary
7	12/22/2008	mary
7	12/18/2008	mary
7	1/30/2010	mary
7	2/25/2011	alex
7	4/4/2012	alex
;

data periods;
do until(last.patientID);
    set PTT; by patientID;
    output;
    lastTestDate = testDate;
    end;
format lastTestDate yymmdd10.;
run;

proc sql;
create table docCounts as
select 
    a.patientID,
    a.testDate,
    a.testResult,
    b.docName, 
    count(b.docName) as docCount
from periods as a left join hist as b
    on a.patientID=b.patientID and
        a.lastTestDate < b.visitDate and
        a.testDate >= b.visitDate
group by a.patientID, a.testDate, a.testResult, b.docName;
create table maxDocCounts as
select *
from docCounts
group by patientId, testDate, docName
having docCount = max(docCount);
quit;

data result;
length frequentDoc $40;
do until(last.testDate);
    set maxDocCounts; by patientID testDate;
    frequentDoc = catx("/", frequentDoc, docName);
    end;
keep patientID testDate testResult frequentDoc;
run;

proc print data=result noobs; 
var patientID testDate testResult frequentDoc; 
run;

 

PG

View solution in original post

4 REPLIES 4
Reeza
Super User
What do you mean by between the test dates?

At any rate you can do a count and join on between the dates of interest. Then use a first with BY processing to take the top occurence. If you insist on SQL you'll have to take the max by test date and join on that.
niam
Quartz | Level 8
I was referring to the time interval between the each two subsequent test dates. Thank you for the help.
PGStats
Opal | Level 21

I don't think this can be done efficiently with SQL only. Here is a combination of datastep and SQL that should be fairly efficient. I added patientID=8 to represent the test-but-no-visit case.

 

data PTT;
input Patientid	TestDate :mmddyy10. TestResult $;
format testdate yymmdd10.;
datalines;
1	5/12/2011	Y
1	6/10/2012	Y
1	7/11/2013	Y
2	1/8/2009	N
2	5/7/2010	Y
3	11/9/2012	Y
4	8/22/2011	Y
5	9/15/2009	N
5	10/11/2010	N
6	12/8/2014	N
7	1/2/2009	N
7	2/3/2010	Y
7	3/4/2011	Y
7	4/5/2012	N
8   6/6/2015    N
;

data hist;
input PatientID  Visitdate :mmddyy10. DocName $;
format visitdate yymmdd10.;
datalines;
1	5/2/2011	john
1	4/27/2011	john
1	6/5/2012	john
1	7/6/2013	mary
1	7/6/2013	mary
2	1/6/2009	john
2	5/3/2010	alex
2	4/28/2010	alex
2	5/6/2010	alex
3	11/1/2012	john
3	10/27/2012	alex
4	8/21/2011	mary
5	9/12/2009	john
5	9/11/2009	alex
5	10/10/2010	alex
6	11/30/2014	john
6	11/15/2014	john
7	12/28/2008	mary
7	12/22/2008	mary
7	12/18/2008	mary
7	1/30/2010	mary
7	2/25/2011	alex
7	4/4/2012	alex
;

data periods;
do until(last.patientID);
    set PTT; by patientID;
    output;
    lastTestDate = testDate;
    end;
format lastTestDate yymmdd10.;
run;

proc sql;
create table docCounts as
select 
    a.patientID,
    a.testDate,
    a.testResult,
    b.docName, 
    count(b.docName) as docCount
from periods as a left join hist as b
    on a.patientID=b.patientID and
        a.lastTestDate < b.visitDate and
        a.testDate >= b.visitDate
group by a.patientID, a.testDate, a.testResult, b.docName;
create table maxDocCounts as
select *
from docCounts
group by patientId, testDate, docName
having docCount = max(docCount);
quit;

data result;
length frequentDoc $40;
do until(last.testDate);
    set maxDocCounts; by patientID testDate;
    frequentDoc = catx("/", frequentDoc, docName);
    end;
keep patientID testDate testResult frequentDoc;
run;

proc print data=result noobs; 
var patientID testDate testResult frequentDoc; 
run;

 

PG
niam
Quartz | Level 8

This works perfectly. Thank you!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 7807 views
  • 2 likes
  • 3 in conversation