Hi Guys,
I suspect that subjects are erranously assigned to control (group&=1) vs exposure(group&=2) in the sample data attached. Actual data is 5 times this sample. Loss in follow up is reflected in the data as well.
I'd like to test if groups are assigned to 1 and 2 consistently across four datasets (pretest=groupc, 7day follow up=group7d, 3mo follow-up=group3m and test among waitlisted=groupd). For that purpose, I thought, full join in proc sql joining all 4 datasets together on group and subj as key variables would help get better grasp of what happened.
I don't mind to get rid of joining on where most incomplete variable is null (where groupc=null). Coz it might not work in this unbalanced data due to loss in follow up over time. Using SAS 9.4.
The code didn't work. Log is shown below.
proc sql;
create table outer as
select a.groupc, a.subjc,
b.groupd, b.subjd,
c.group7d, c.subj7d,
d.group3m, d.subj3m
from groupc as a,
groupd as b,
group7d as c,
group3m as d
full join groupc on a.groupc = b.groupd and a.subjc = c.subjd
full join groupc on a.groupc = c.group7d and a.subcj = c.subj7d
full join groupd on b.groupd = c.group7d and b.subjd = c.subj7d
full join group7d on c.group7d = d.group7d and c.subj7d = d.subj3m
where a.group3m is null;
quit;
Log:
617 proc sql;
618 create table outer as
619 select a.groupc, a.subjc,
620 b.groupd, b.subjd,
621 c.group7d, c.subj7d,
622 d.group3m, d.subj3m
623 from groupc as a,
624 groupd as b,
625 group7d as c,
626 group3m as d
627 full join groupc on a.groupc = b.groupd and a.subjc = c.subjd
628 full join groupc on a.groupc = c.group7d and a.subcj = c.subj7d
629 full join groupd on b.groupd = c.group7d and b.subjd = c.subj7d
630 full join group7d on c.group7d = d.group7d and c.subj7d = d.subj3m
631 where a.group3m is null;
ERROR: Column subjd could not be found in the table/view identified with the correlation name C.
ERROR: Column subjd could not be found in the table/view identified with the correlation name C.
ERROR: Column subjd could not be found in the table/view identified with the correlation name C.
ERROR: Column subjd could not be found in the table/view identified with the correlation name C.
ERROR: Column subjd could not be found in the table/view identified with the correlation name C.
ERROR: Column subcj could not be found in the table/view identified with the correlation name A.
ERROR: Column subcj could not be found in the table/view identified with the correlation name A.
ERROR: Column subcj could not be found in the table/view identified with the correlation name A.
ERROR: Column subcj could not be found in the table/view identified with the correlation name A.
ERROR: Column group7d could not be found in the table/view identified with the correlation name D.
ERROR: Column group7d could not be found in the table/view identified with the correlation name D.
ERROR: Column group3m could not be found in the table/view identified with the correlation name A.
ERROR: Expression using equals (=) has components that are of different data types.
632 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Thank you very much for your time.
proc export data=groupc
outfile="\groupc.csv"
dbms=csv;
proc export data=groupd
outfile="\groupd.csv"
dbms=csv;
proc export data=group7d
outfile="\group7d.csv"
dbms=csv;
proc export data=group3m
outfile="\group3m.csv"
dbms=csv;
run;
Check which variables are in which dataset. Then re-formulate the join.
Thanks Kurt, i just corrected typo and misplaced variables. Now I get new error. Any suggestions as to what am I missing? Thanks again.
proc sql;
create table outer as
select a.groupc, a.subjc,
b.groupd, b.subjd,
c.group7d, c.subj7d,
d.group3m, d.subj3m
from groupc as a,
groupd as b,
group7d as c,
group3m as d
full join groupc on a.groupc = b.groupd and a.subjc = b.subjd
full join groupc on a.groupc = c.group7d and a.subjc = c.subj7d
full join groupd on b.groupd = c.group7d and b.subjd = c.subj7d
full join group7d on c.group7d = d.group3m and c.subj7d = d.subj3m
where d.subj3m is null;
quit;
New error in the log:
ERROR: Correlated reference to column groupc is not contained within a subquery.
ERROR: Correlated reference to column groupd is not contained within a subquery.
ERROR: Correlated reference to column subjc is not contained within a subquery.
ERROR: Correlated reference to column subjd is not contained within a subquery.
That's not how you write a FROM statement.
Your from part, as @Reeza noted, makes of course no sense at all; with regards to SQL syntax it is Vogon poetry.
This is at least syntactically correct, but I have no clue about the semantic correctness:
proc sql;
create table outer as
select
a.groupc,
a.subjc,
b.groupd,
b.subjd,
c.group7d,
c.subj7d,
d.group3m,
d.subj3m
from
groupc as a
full join groupd as b on a.groupc = b.groupd and a.subjc = b.subjd
full join group7d as c on a.groupc = c.group7d and a.subjc = c.subj7d
full join group3m as d on c.group7d = d.group3m and c.subj7d = d.subj3m
where d.subj3m is null
;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.