Hi! Looking for some help for fixing my proc sql syntax...
I have five tables representing a year (FY10-FY14) of health encounters for all patients during that specific year.
I need to identify the cohort of patients that recieved care in each of those five years. So first health encounter should be in FY10 and last encounter should be in FY14 but the patient should also have encounters in FY12 and FY13. This is my syntax used, but when I check the output at the end, the first encounter for many patients were not in FY10. Any help is appreciated!!
proc sql;
create table CRG.ASTHMA_FY10to14_1 as
select a.*, b.*,c.*,d.*,e.*
from CRG.ASTHMA_fy10_crg1 as a, CRG.ASTHMA_fy11_crg1 as b, CRG.ASTHMA_fy12_crg1 as c, CRG.ASTHMA_fy13_crg1 as d, CRG.ASTHMA_fy14_crg1 as e
where a.patientid=b.patientid=c.patientid=d.patientid=e.patientid;
quit;
*Identify patients that are in all fiscal years. 43152 rows and 22 columns;
proc sql;
create table CRG.ASTHMA_FY10to14_All_Cohort as
select a.*, b.*
from CRG.ASTHMA_FY10to14_1 as a, asthma.FY10to14_Asthma_IPOP_PDTS_1 as b
where a.patientid=b.patientid;
quit;
*Join the patientid with the full medical record. NOTE: Table CRG.ASTHMA_FY10TO14_ALL_COHORT created, with 4567056 rows and 113 columns;
data first_enc_check;
set CRG.ASTHMA_FY10to14_All_Cohort;
by patientid;
if first.patientid then output;
run;
*checked to make sure first.patientid was in FY10...and not all were.
I would prefer to use set operations:
proc sql;
create table everyYear as
select patientId from CRG.ASTHMA_fy10_crg1
intersect
select patientId from CRG.ASTHMA_fy11_crg1
intersect
select patientId from CRG.ASTHMA_fy12_crg1
intersect
select patientId from CRG.ASTHMA_fy13_crg1
intersect
select patientId from CRG.ASTHMA_fy14_crg1;
create table CRG.ASTHMA_FY10to14_1 as
select * from CRG.ASTHMA_fy10_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy11_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy12_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy13_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy14_crg1 where patientId in (select patientId from everyYear)
order by patientId, myTimeVariable;
drop table everyYear;
quit;
Use an inner join.
I don't think you can list your where condition that way either you need a bunch of AND in between your equal signs.
Here's a sketch of how it should look:
From table1 as a
Inner join table2 as b
On a.id=b.id
Inner join table3 as c
On a.id=c.id
....
Where *any other condition *;
Quit;
There's a post from a SAS trainer that explains why this is actually faster than your query which is a cross join that is then filtered.
If if all this is gibberish try reading up on join types 🤓
@Reeza Any link for the SAS trainer post? I can't find it.
While it seems intuitive, my own quick benchmark can't confirm this.
data TAB1;
do I=1 to 100e6;
output;
end;
run;
data TAB2;
do I=1 to 200e6 by 2;
output;
end;
run;
create table OUT as select tab1.* from TAB1 , TAB2 where tab1.I=tab2.I and tab1.I < 50e6; |
real time 20.02 seconds user cpu time 26.50 seconds system cpu time 2.99 seconds memory 1060985.07k OS Memory 1084668.00k Step Count 576 Switch Count 82 |
create table OUT as select tab1.* from TAB1 inner join TAB2 on tab1.I=tab2.I and tab1.I < 50e6; |
real time 20.49 seconds user cpu time 27.08 seconds system cpu time 2.60 seconds memory 1060985.95k OS Memory 1084668.00k Step Count 578 Switch Count 92 |
create table OUT as select tab1.* from TAB1 inner join TAB2 on tab1.I=tab2.I where tab1.I < 50e6; |
real time 20.24 seconds user cpu time 27.42 seconds system cpu time 2.76 seconds memory 1060986.07k OS Memory 1084668.00k Step Count 580 Switch Count 92 |
Note that the optimizer could do a much better job; faster time and much less memory when using a dataset option:
create table OUT as select tab1.* from TAB1(where=(I < 50e6)) , TAB2 where tab1.I=tab2.I
|
real time 20:10.43 user cpu time 16:54.70 system cpu time 3:15.51 memory 12688.53k OS Memory 37428.00k Step Count 577 Switch Count 82 |
create table OUT as select tab1.* from TAB1(where=(I < 50e6)) inner join TAB2 on tab1.I=tab2.I; |
real time 20:00.49 user cpu time 16:47.42 system cpu time 3:12.92 memory 12688.53k OS Memory 37428.00k Step Count 579 Switch Count 90 |
The intersect and the subquery syntaxes are slower:
create table OUT as select tab1.* from TAB1 where tab1.I < 50e6 intersect select tab2.* from TAB2 ; |
real time 28.42 seconds user cpu time 49.32 seconds system cpu time 6.20 seconds memory 1060660.10k OS Memory 1083768.00k Step Count 583 Switch Count 125 |
create table OUT as select tab1.* from TAB1 where I in(select I from TAB2) and tab1.I < 50e6; |
real time 47:44.37 user cpu time 11:18.49 system cpu time 36:24.32 memory 5444475.43k OS Memory 6238204.00k Step Count 581 Switch Count 3794 |
More performance and benchmark fun in
Maybe I misunderstood the post.
I prefer an explicit join for clarity when reviewing code
Here's the post
http://blogs.sas.com/content/sastraining/2013/02/04/a-database-professionals-best-friend-2/
If you did just want to fix your where condition it would be :
Where a.id=b.id and a.id=c.id and a.id=d.id and a.id=e.id;
I would prefer to use set operations:
proc sql;
create table everyYear as
select patientId from CRG.ASTHMA_fy10_crg1
intersect
select patientId from CRG.ASTHMA_fy11_crg1
intersect
select patientId from CRG.ASTHMA_fy12_crg1
intersect
select patientId from CRG.ASTHMA_fy13_crg1
intersect
select patientId from CRG.ASTHMA_fy14_crg1;
create table CRG.ASTHMA_FY10to14_1 as
select * from CRG.ASTHMA_fy10_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy11_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy12_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy13_crg1 where patientId in (select patientId from everyYear)
union all corresponding
select * from CRG.ASTHMA_fy14_crg1 where patientId in (select patientId from everyYear)
order by patientId, myTimeVariable;
drop table everyYear;
quit;
@PGStatsWhat type of union or join would allow me to keep all the variables found in each CRG.asthma table (in your second step)? Each table has unique variables assigned to that Fiscal Year. It seems the Union All just keeps those variables that are the same in each table. Thanks!!
Replace union all corresponding with outer union corresponding
@PGStats one more thing! The output is long, and I need to make wide...i.e.
one Patientid and all corresponding FY variables on one line.
my current output looks like
datalines;
ID 10_CRG 11_CRG 12_CRG 13_CRG 14_CRG
1 2
1 . 4
1 . . 3
1 . . . 2
1 . . . . 3
want to look like
1 2 4 3 2 3
So, a transpose I think, but i have never used transpose and it is transpoing weird output.
Why wide? Why not
ID YEAR CRG
1 2010 2
1 2011 4
1 2012 3
1 2013 2
1 2014 3
you would save yourself so much trouble.
@PGStats Can you gelp me get to the result you recomended? I can work with a table that is formatted the way you suggested.
My transpose attempt:
proc transpose data=CRG.ASTHMA_FY10to14_F out=long prefix=CRG;
by patientid;
var concurrent_fy10-concurrent_fy14;
run;
resulted in this (patientid hidden by me):
Obs | _NAME_ | CRG1 | CRG2 | CRG3 | CRG4 | CRG5 |
---|---|---|---|---|---|---|
1 | CONCURRENT_FY10 | 10000 | 10000 | |||
2 | CONCURRENT_FY11 | 51381 | 51381 | |||
3 | CONCURRENT_FY12 | 51381 | ||||
4 | CONCURRENT_FY13 | |||||
5 | CONCURRENT_FY14 | |||||
How can I get to this type of table:
ID YEAR CRG
1 2010 10000
1 2010 10000
1 2011 51381
1 2011 51381
1 2012 51381
1 2013 .
1 2014 .
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.