Solved: Re: identifying patients in all tables using proc sql

jenim514 · Posted 04-18-2016 06:02 PM

Hi! Looking for some help for fixing my proc sql syntax...

I have five tables representing a year (FY10-FY14) of health encounters for all patients during that specific year.

I need to identify the cohort of patients that recieved care in each of those five years. So first health encounter should be in FY10 and last encounter should be in FY14 but the patient should also have encounters in FY12 and FY13. This is my syntax used, but when I check the output at the end, the first encounter for many patients were not in FY10. Any help is appreciated!!

proc sql;
create table CRG.ASTHMA_FY10to14_1 as
select a.*, b.*,c.*,d.*,e.*
from  CRG.ASTHMA_fy10_crg1 as a, CRG.ASTHMA_fy11_crg1  as b, CRG.ASTHMA_fy12_crg1 as c, CRG.ASTHMA_fy13_crg1 as d, CRG.ASTHMA_fy14_crg1 as e
where a.patientid=b.patientid=c.patientid=d.patientid=e.patientid;
quit;
*Identify patients that are in all fiscal years. 43152 rows and 22 columns;

proc sql;
create table CRG.ASTHMA_FY10to14_All_Cohort as
select a.*, b.*
from CRG.ASTHMA_FY10to14_1 as a, asthma.FY10to14_Asthma_IPOP_PDTS_1 as b
where a.patientid=b.patientid;
quit;
*Join the patientid with the full medical record.  NOTE: Table CRG.ASTHMA_FY10TO14_ALL_COHORT created, with 4567056 rows and 113 columns;

data first_enc_check;
set CRG.ASTHMA_FY10to14_All_Cohort;
by patientid;
if first.patientid then output;
run;
*checked to make sure first.patientid was in FY10...and not all  were.

PGStats · Posted 04-18-2016 10:03 PM

I would prefer to use set operations:

proc sql;

create table everyYear as
select patientId from CRG.ASTHMA_fy10_crg1
intersect
select patientId from CRG.ASTHMA_fy11_crg1
intersect
select patientId from CRG.ASTHMA_fy12_crg1
intersect
select patientId from CRG.ASTHMA_fy13_crg1
intersect
select patientId from CRG.ASTHMA_fy14_crg1;

create table CRG.ASTHMA_FY10to14_1 as
select * from CRG.ASTHMA_fy10_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy11_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy12_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy13_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy14_crg1 where patientId in (select patientId from everyYear)
order by patientId, myTimeVariable;

drop table everyYear;

quit;

PG

View solution in original post

Reeza · Posted 04-18-2016 06:35 PM

Use an inner join.

I don't think you can list your where condition that way either you need a bunch of AND in between your equal signs.

Here's a sketch of how it should look:

From table1 as a
Inner join table2 as b
On a.id=b.id
Inner join table3 as c
On a.id=c.id
....
Where *any other condition *;
Quit;

There's a post from a SAS trainer that explains why this is actually faster than your query which is a cross join that is then filtered.

If if all this is gibberish try reading up on join types 🤓

ChrisNZ · Posted 04-19-2016 06:18 PM

@Reeza Any link for the SAS trainer post? I can't find it.

While it seems intuitive, my own quick benchmark can't confirm this.

data TAB1;

do I=1 to 100e6;

output;

end;

run;

data TAB2;

do I=1 to 200e6 by 2;

output;

end;

run;

create table OUT as

select tab1.*

from TAB1

, TAB2

where tab1.I=tab2.I

and tab1.I < 50e6;

real time 20.02 seconds

user cpu time 26.50 seconds

system cpu time 2.99 seconds

memory 1060985.07k

OS Memory 1084668.00k

Step Count 576 Switch Count 82

create table OUT as

select tab1.*

from TAB1

inner join

TAB2

on tab1.I=tab2.I

and tab1.I < 50e6;

real time 20.49 seconds

user cpu time 27.08 seconds

system cpu time 2.60 seconds

memory 1060985.95k

OS Memory 1084668.00k

Step Count 578 Switch Count 92

create table OUT as

select tab1.*

from TAB1

inner join

TAB2

on tab1.I=tab2.I

where tab1.I < 50e6;

real time 20.24 seconds

user cpu time 27.42 seconds

system cpu time 2.76 seconds

memory 1060986.07k

OS Memory 1084668.00k

Step Count 580 Switch Count 92

Note that the optimizer could do a much better job; faster time and much less memory when using a dataset option:

create table OUT as

select tab1.*

from TAB1(where=(I < 50e6))

, TAB2

where tab1.I=tab2.I

real time 20:10.43

user cpu time 16:54.70

system cpu time 3:15.51

memory 12688.53k

OS Memory 37428.00k

Step Count 577 Switch Count 82

create table OUT as

select tab1.*

from TAB1(where=(I < 50e6))

inner join

TAB2

on tab1.I=tab2.I;

real time 20:00.49

user cpu time 16:47.42

system cpu time 3:12.92

memory 12688.53k

OS Memory 37428.00k

Step Count 579 Switch Count 90

The intersect and the subquery syntaxes are slower:

create table OUT as

select tab1.*

from TAB1

where tab1.I < 50e6

intersect

select tab2.*

from TAB2 ;

real time 28.42 seconds

user cpu time 49.32 seconds

system cpu time 6.20 seconds

memory 1060660.10k

OS Memory 1083768.00k

Step Count 583 Switch Count 125

create table OUT as

select tab1.*

from TAB1

where I in(select I from TAB2)

and tab1.I < 50e6;

real time 47:44.37

user cpu time 11:18.49

system cpu time 36:24.32

memory 5444475.43k

OS Memory 6238204.00k

Step Count 581 Switch Count 3794

More performance and benchmark fun in

High-Performance SAS Coding - Third Edition

Reeza · Posted 04-20-2016 12:17 PM

Maybe I misunderstood the post.

I prefer an explicit join for clarity when reviewing code

Here's the post

http://blogs.sas.com/content/sastraining/2013/02/04/a-database-professionals-best-friend-2/

Reeza · Posted 04-18-2016 06:36 PM

If you did just want to fix your where condition it would be :

Where a.id=b.id and a.id=c.id and a.id=d.id and a.id=e.id;

PGStats · Posted 04-18-2016 10:03 PM

I would prefer to use set operations:

proc sql;

create table everyYear as
select patientId from CRG.ASTHMA_fy10_crg1
intersect
select patientId from CRG.ASTHMA_fy11_crg1
intersect
select patientId from CRG.ASTHMA_fy12_crg1
intersect
select patientId from CRG.ASTHMA_fy13_crg1
intersect
select patientId from CRG.ASTHMA_fy14_crg1;

create table CRG.ASTHMA_FY10to14_1 as
select * from CRG.ASTHMA_fy10_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy11_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy12_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy13_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy14_crg1 where patientId in (select patientId from everyYear)
order by patientId, myTimeVariable;

drop table everyYear;

quit;

PG

jenim514 · Posted 04-19-2016 01:48 PM

@PGStatsWhat type of union or join would allow me to keep all the variables found in each CRG.asthma table (in your second step)? Each table has unique variables assigned to that Fiscal Year. It seems the Union All just keeps those variables that are the same in each table. Thanks!!

PGStats · Posted 04-19-2016 02:03 PM

Replace union all corresponding with outer union corresponding

PG

jenim514 · Posted 04-19-2016 02:57 PM

@PGStats one more thing! The output is long, and I need to make wide...i.e.
one Patientid and all corresponding FY variables on one line.

my current output looks like
datalines;

ID 10_CRG 11_CRG 12_CRG 13_CRG 14_CRG
1 2
1 . 4
1 . . 3
1 . . . 2
1 . . . . 3

want to look like

1 2 4 3 2 3

So, a transpose I think, but i have never used transpose and it is transpoing weird output.

PGStats · Posted 04-19-2016 04:13 PM

Why wide? Why not

ID YEAR CRG
1 2010 2
1 2011 4
1 2012 3
1 2013 2
1 2014 3

you would save yourself so much trouble.

PG

jenim514 · Posted 04-20-2016 12:11 PM

@PGStats Can you gelp me get to the result you recomended? I can work with a table that is formatted the way you suggested.

My transpose attempt:

proc transpose data=CRG.ASTHMA_FY10to14_F out=long prefix=CRG;
by patientid;
var concurrent_fy10-concurrent_fy14;
run;

resulted in this (patientid hidden by me):

Obs	_NAME_	CRG1	CRG2	CRG3	CRG4	CRG5
1	CONCURRENT_FY10	10000	10000
2	CONCURRENT_FY11			51381	51381
3	CONCURRENT_FY12					51381
4	CONCURRENT_FY13
5	CONCURRENT_FY14

How can I get to this type of table:

ID YEAR CRG
1 2010 10000

1 2010 10000
1 2011 51381
1 2011 51381
1 2012 51381
1 2013 .

1 2014 .

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away