DATA Step, Macro, Functions and more

identifying patients in all tables using proc sql

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 142
Accepted Solution

identifying patients in all tables using proc sql

Hi!  Looking for some help for fixing my proc sql syntax... 

I have five tables representing a year (FY10-FY14) of health encounters for all patients during that specific year.

I need to identify the cohort of patients that recieved care in each of those five years.  So first health encounter should be in FY10 and last encounter should be in FY14 but the patient should also have encounters in FY12 and FY13.   This is my syntax used, but when I check the output at the end, the first encounter for many patients were not in FY10.  Any help is appreciated!!

proc sql;
create table CRG.ASTHMA_FY10to14_1 as
select a.*, b.*,c.*,d.*,e.*
from  CRG.ASTHMA_fy10_crg1 as a, CRG.ASTHMA_fy11_crg1  as b, CRG.ASTHMA_fy12_crg1 as c, CRG.ASTHMA_fy13_crg1 as d, CRG.ASTHMA_fy14_crg1 as e
where a.patientid=b.patientid=c.patientid=d.patientid=e.patientid;
quit;
*Identify patients that are in all fiscal years. 43152 rows and 22 columns;

proc sql;
create table CRG.ASTHMA_FY10to14_All_Cohort as
select a.*, b.*
from CRG.ASTHMA_FY10to14_1 as a, asthma.FY10to14_Asthma_IPOP_PDTS_1 as b
where a.patientid=b.patientid;
quit;
*Join the patientid with the full medical record.  NOTE: Table CRG.ASTHMA_FY10TO14_ALL_COHORT created, with 4567056 rows and 113 columns;

data first_enc_check;
set CRG.ASTHMA_FY10to14_All_Cohort;
by patientid;
if first.patientid then output;
run;
*checked to make sure first.patientid was in FY10...and not all  were.

Accepted Solutions
Solution
‎04-20-2016 12:35 PM
Respected Advisor
Posts: 4,935

Re: identifying patients in all tables using proc sql

[ Edited ]

I would prefer to use set operations:

 

proc sql;

create table everyYear as
select patientId from CRG.ASTHMA_fy10_crg1
intersect
select patientId from CRG.ASTHMA_fy11_crg1
intersect
select patientId from CRG.ASTHMA_fy12_crg1
intersect
select patientId from CRG.ASTHMA_fy13_crg1
intersect
select patientId from CRG.ASTHMA_fy14_crg1;

create table CRG.ASTHMA_FY10to14_1 as
select * from CRG.ASTHMA_fy10_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy11_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy12_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy13_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy14_crg1 where patientId in (select patientId from everyYear)
order by patientId, myTimeVariable;

drop table everyYear;

quit;
PG

View solution in original post


All Replies
Super User
Posts: 19,875

Re: identifying patients in all tables using proc sql

Use an inner join. 

 

I don't think you can list your where condition that way either you need a bunch of AND in between your equal signs. 

Here's a sketch of how it should look:

 

From table1 as a
Inner join table2 as b
On a.id=b.id
Inner join table3 as c
On a.id=c.id
....
Where *any other condition *;
Quit;

There's a post from a SAS trainer that explains why this is actually faster than your query which is a cross join that is then filtered. 

 

If if all this is gibberish try reading up on join types 🤓

PROC Star
Posts: 1,760

Re: identifying patients in all tables using proc sql

[ Edited ]

@Reeza Any link for the SAS trainer post? I can't find it.

While it seems intuitive, my own quick benchmark can't confirm this.

 

data TAB1;

do I=1 to 100e6;

output;

end;

run;

data TAB2;

do I=1 to 200e6 by 2;

output;

end;

run;

create table OUT as

select tab1.*

from TAB1

     , TAB2

where tab1.I=tab2.I

   and tab1.I < 50e6;

real time           20.02 seconds

user cpu time       26.50 seconds

system cpu time     2.99 seconds

memory             1060985.07k

OS Memory           1084668.00k

Step Count      576   Switch Count 82

create table OUT as

select tab1.*

from TAB1

       inner join

     TAB2

       on tab1.I=tab2.I

       and tab1.I < 50e6;

real time           20.49 seconds

user cpu time       27.08 seconds

system cpu time     2.60 seconds

memory             1060985.95k

OS Memory           1084668.00k

Step Count       578 Switch Count 92

create table OUT as

select tab1.*

from TAB1

       inner join

     TAB2

       on tab1.I=tab2.I

where tab1.I < 50e6;

real time           20.24 seconds

user cpu time       27.42 seconds

system cpu time     2.76 seconds

memory             1060986.07k

OS Memory           1084668.00k

Step Count      580   Switch Count 92

 

Note that the optimizer could do a much better job; faster time and much less memory when using a dataset option:

 

create table OUT as

select tab1.*

from TAB1(where=(I < 50e6))

   , TAB2

where tab1.I=tab2.I

 

real time           20:10.43

user cpu time       16:54.70

system cpu time     3:15.51

memory             12688.53k

OS Memory           37428.00k

Step Count       577 Switch Count 82

create table OUT as

select tab1.*

from TAB1(where=(I < 50e6))

       inner join

     TAB2

       on tab1.I=tab2.I;

real time           20:00.49

user cpu time       16:47.42

system cpu time     3:12.92

memory             12688.53k

OS Memory           37428.00k

Step Count        579 Switch Count 90

 

The intersect and the subquery syntaxes are slower:

 

create table OUT as

   select tab1.*

   from TAB1

   where tab1.I < 50e6

intersect

   select tab2.*

   from TAB2 ;

real time           28.42 seconds

user cpu time       49.32 seconds

system cpu time     6.20 seconds

memory             1060660.10k

OS Memory           1083768.00k

Step Count      583 Switch Count 125

create table OUT as

select tab1.*

from TAB1

where I in(select I from TAB2)

and tab1.I < 50e6; 

real time           47:44.37

user cpu time       11:18.49

system cpu time     36:24.32

memory             5444475.43k

OS Memory           6238204.00k

Step Count      581 Switch Count 3794

 

More performance and benchmark fun in 

  

Super User
Posts: 19,875

Re: identifying patients in all tables using proc sql

Maybe I misunderstood the post. 

 

I prefer an explicit join for clarity when reviewing code 

 

Here's the post

http://blogs.sas.com/content/sastraining/2013/02/04/a-database-professionals-best-friend-2/

Super User
Posts: 19,875

Re: identifying patients in all tables using proc sql

If you did just want to fix your where condition it would be :

 

Where a.id=b.id and a.id=c.id and a.id=d.id and a.id=e.id;
Solution
‎04-20-2016 12:35 PM
Respected Advisor
Posts: 4,935

Re: identifying patients in all tables using proc sql

[ Edited ]

I would prefer to use set operations:

 

proc sql;

create table everyYear as
select patientId from CRG.ASTHMA_fy10_crg1
intersect
select patientId from CRG.ASTHMA_fy11_crg1
intersect
select patientId from CRG.ASTHMA_fy12_crg1
intersect
select patientId from CRG.ASTHMA_fy13_crg1
intersect
select patientId from CRG.ASTHMA_fy14_crg1;

create table CRG.ASTHMA_FY10to14_1 as
select * from CRG.ASTHMA_fy10_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy11_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy12_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy13_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy14_crg1 where patientId in (select patientId from everyYear)
order by patientId, myTimeVariable;

drop table everyYear;

quit;
PG
Frequent Contributor
Posts: 142

Re: identifying patients in all tables using proc sql

[ Edited ]

@PGStatsWhat type of union or join would allow me to keep all the variables found in each CRG.asthma table (in your second step)? Each table has unique variables assigned to that Fiscal Year. It seems the Union All just keeps those variables that are the same in each table. Thanks!!

Respected Advisor
Posts: 4,935

Re: identifying patients in all tables using proc sql

Replace union all corresponding with outer union corresponding

PG
Frequent Contributor
Posts: 142

Re: identifying patients in all tables using proc sql

@PGStats one more thing!  The output is long, and I need to make wide...i.e.
 one Patientid and all corresponding FY variables on one line.  

my current output looks like
datalines;


ID 10_CRG  11_CRG 12_CRG 13_CRG 14_CRG
1 2
1 . 4
1 . . 3
1 . . . 2
1 . . . . 3

 

want to look like

 

1 2 4 3 2 3

 

So, a transpose I think, but i have never used transpose and it is transpoing weird output.

 

 

Respected Advisor
Posts: 4,935

Re: identifying patients in all tables using proc sql

Why wide? Why not

 

ID YEAR CRG
1 2010 2
1 2011 4
1 2012 3
1 2013 2
1 2014 3

 

you would save yourself so much trouble.

 

PG
Frequent Contributor
Posts: 142

Re: identifying patients in all tables using proc sql

@PGStats  Can you gelp me get to the result you recomended?  I can work with a table that is formatted the way you suggested.

 

My transpose attempt:

 

proc transpose data=CRG.ASTHMA_FY10to14_F out=long prefix=CRG;
by patientid;
var concurrent_fy10-concurrent_fy14;
run;

 

resulted in this (patientid hidden by me):

 

Obs _NAME_ CRG1 CRG2 CRG3 CRG4 CRG5
1 CONCURRENT_FY10 10000 10000      
2 CONCURRENT_FY11     51381 51381  
3 CONCURRENT_FY12         51381
4 CONCURRENT_FY13          
5 CONCURRENT_FY14          
             
             
             
             
             

 

 

How can I get to this type of table:

 

ID YEAR CRG
1 2010 10000

1 2010 10000
1 2011  51381
1 2011  51381
1 2012  51381
1 2013  .

1 2014  .

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 384 views
  • 0 likes
  • 4 in conversation