Help using Base SAS procedures

Join 4 tables using proc sql to check consistency across multiple datasets?

Reply
Frequent Contributor
Posts: 125

Join 4 tables using proc sql to check consistency across multiple datasets?

[ Edited ]

Hi Guys,

I suspect that subjects are erranously assigned to control (group&=1) vs exposure(group&=2)  in the sample data attached. Actual data is 5 times this sample. Loss in follow up is reflected in the data as well.

I'd like to test if groups are assigned to 1 and 2 consistently across four datasets (pretest=groupc, 7day follow up=group7d, 3mo follow-up=group3m and test among waitlisted=groupd). For that purpose, I thought, full join in proc sql joining all 4 datasets together on group and subj as key variables would help get better grasp of what happened.

I don't mind to get rid of joining on where most incomplete variable is null (where groupc=null). Coz it might not work in this unbalanced data due to loss in follow up over time. Using SAS 9.4.

The code didn't work. Log is shown below.

 

proc sql; 
create table outer as 
select 	a.groupc,  a.subjc, 
	b.groupd,  b.subjd, 
	c.group7d, c.subj7d,
	d.group3m, d.subj3m
	from 	groupc  as a, 
		groupd  as b, 
		group7d as c, 
		group3m as d 
		full join groupc  on a.groupc =  b.groupd  and a.subjc  = c.subjd 
		full join groupc  on a.groupc =  c.group7d and a.subcj  = c.subj7d
		full join groupd  on b.groupd =  c.group7d and b.subjd  = c.subj7d
		full join group7d on c.group7d = d.group7d and c.subj7d = d.subj3m
        where a.group3m is null;
quit; 

 

Log:

 

 

617  proc sql;
618      create table outer as
619          select  a.groupc,  a.subjc,
620                  b.groupd,  b.subjd,
621                  c.group7d, c.subj7d,
622                  d.group3m, d.subj3m
623          from            groupc  as a,
624                          groupd  as b,
625                          group7d as c,
626                          group3m as d
627          full join groupc  on a.groupc =  b.groupd  and a.subjc  = c.subjd
628          full join groupc  on a.groupc =  c.group7d and a.subcj  = c.subj7d
629          full join groupd  on b.groupd =  c.group7d and b.subjd  = c.subj7d
630          full join group7d on c.group7d = d.group7d and c.subj7d = d.subj3m
631          where a.group3m is null;
ERROR: Column subjd could not be found in the table/view identified with the correlation name C.
ERROR: Column subjd could not be found in the table/view identified with the correlation name C.
ERROR: Column subjd could not be found in the table/view identified with the correlation name C.
ERROR: Column subjd could not be found in the table/view identified with the correlation name C.
ERROR: Column subjd could not be found in the table/view identified with the correlation name C.
ERROR: Column subcj could not be found in the table/view identified with the correlation name A.
ERROR: Column subcj could not be found in the table/view identified with the correlation name A.
ERROR: Column subcj could not be found in the table/view identified with the correlation name A.
ERROR: Column subcj could not be found in the table/view identified with the correlation name A.
ERROR: Column group7d could not be found in the table/view identified with the correlation name D.
ERROR: Column group7d could not be found in the table/view identified with the correlation name D.
ERROR: Column group3m could not be found in the table/view identified with the correlation name A.
ERROR: Expression using equals (=) has components that are of different data types.
632  quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

 Thank you very much for your time.

proc export data=groupc
outfile="\groupc.csv"
dbms=csv;
proc export data=groupd
outfile="\groupd.csv"
dbms=csv;
proc export data=group7d
outfile="\group7d.csv"
dbms=csv;
proc export data=group3m
outfile="\group3m.csv"
dbms=csv;
run;

 

Attachment
Attachment
Attachment
Attachment
Attachment
Super User
Posts: 6,928

Re: Join 4 tables using proc sql to check consistency across multiple datasets?

Check which variables are in which dataset. Then re-formulate the join.

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 125

Re: Join 4 tables using proc sql to check consistency across multiple datasets?

@KurtBremser

Thanks Kurt, i just corrected typo and misplaced variables. Now I get new error. Any suggestions as to what am I missing? Thanks again.

 

 

proc sql; 
create table outer as 
	select 	a.groupc,  a.subjc, 
		b.groupd,  b.subjd, 
		c.group7d, c.subj7d,
		d.group3m, d.subj3m
	from 	groupc  as a, 
		groupd  as b, 
		group7d as c, 
		group3m as d 
	full join groupc  on a.groupc =  b.groupd  and a.subjc  = b.subjd 
	full join groupc  on a.groupc =  c.group7d and a.subjc  = c.subj7d
	full join groupd  on b.groupd =  c.group7d and b.subjd  = c.subj7d
	full join group7d on c.group7d = d.group3m and c.subj7d = d.subj3m
where d.subj3m is null; quit;

 

New error in the log:

 

ERROR: Correlated reference to column groupc is not contained within a subquery.
ERROR: Correlated reference to column groupd is not contained within a subquery.
ERROR: Correlated reference to column subjc is not contained within a subquery.
ERROR: Correlated reference to column subjd is not contained within a subquery.

 

Super User
Posts: 17,750

Re: Join 4 tables using proc sql to check consistency across multiple datasets?

That's not how you write a FROM statement. 

 

 

Super User
Posts: 6,928

Re: Join 4 tables using proc sql to check consistency across multiple datasets?

Your from part, as @Reeza noted, makes of course no sense at all; with regards to SQL syntax it is Vogon poetry.

 

This is at least syntactically correct, but I have no clue about the semantic correctness:

proc sql; 
create table outer as 
select
  a.groupc,
  a.subjc,
  b.groupd,
  b.subjd, 
  c.group7d,
  c.subj7d,
  d.group3m,
  d.subj3m
from
  groupc as a
  full join groupd as b on a.groupc = b.groupd and a.subjc = b.subjd 
  full join group7d as c on a.groupc = c.group7d and a.subjc = c.subj7d
  full join group3m as d on c.group7d = d.group3m and c.subj7d = d.subj3m
where d.subj3m is null
;
quit; 
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 4 replies
  • 196 views
  • 1 like
  • 3 in conversation