DATA Step, Macro, Functions and more

Joining multiple tables

Reply
Contributor
Posts: 43

Joining multiple tables

I'm trying to join multiple datasets as below but it is running without ending.

 

I'm sure its forming cartesian product but not sure what i'm doing wrong.

 

proc sql;

	create table	strans.pprofile as
	
		select		ae.* , ag.*, cm.*, di_ri.*, di_tr.*, dm.*, ds.*, ds_ic.*, ds_t.* 
					, dv.*, eg.*, en.*, ex.*, fa.*, ie.*, lb.*, lb_c.*, lb_u.*,
					mh.*, pe.*, re.*, sv.*
		
		from		seretide.ae (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit
								 rename =(VarGroup1Row = ae_vg1row)) as ae
									    
		left join	seretide.ag (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ag
									    
		on			ae.subjectid = ag.subjectid
		
		left join	seretide.cm (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit
									    rename =(VarGroup1Row = cm_vg1row))as cm									    
		on			ae.subjectid = cm.subjectid
		
		left join	seretide.di_ri (where = (subjectid='0001') 
								 	drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as di_ri
									    
		on			ae.subjectid = di_ri.subjectid
		
		left join	seretide.di_tr (where = (subjectid='0001') 
								 	drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as di_tr
									    
		on			ae.subjectid = di_tr.subjectid
		
		left join	seretide.dm (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as dm
									    
		on			ae.subjectid = dm.subjectid
		
		left join	seretide.ds (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ds
									    
		on			ae.subjectid = ds.subjectid
		
		left join	seretide.ds_ic (where = (subjectid='0001') 
								 	drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ds_ic
									    
		on			ae.subjectid = ds_ic.subjectid
				
		left join	seretide.ds_t (where = (subjectid='0001') 
								   drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ds_t
									    
		on			ae.subjectid = ds_t.subjectid
			
		left join	seretide.dv (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as dv
									    
		on			ae.subjectid = dv.subjectid
					
		left join	seretide.eg (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as eg
									    
		on			ae.subjectid = eg.subjectid
						
		left join	seretide.en (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as en
									    
		on			ae.subjectid = en.subjectid
						
		left join	seretide.ex (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ex
									    
		on			ae.subjectid = ex.subjectid
						
		left join	seretide.fa (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as fa
									    
		on			ae.subjectid = fa.subjectid
						
		left join	seretide.ie (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ie
									    
		on			ae.subjectid = ie.subjectid
						
		left join	seretide.lb (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as lb
									    
		on			ae.subjectid = lb.subjectid
						
		left join	seretide.lb_c (where = (subjectid='0001') 
								   drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as lb_c
									    
		on			ae.subjectid = lb_c.subjectid
						
		left join	seretide.lb_u (where = (subjectid='0001') 
								   drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as lb_u
									    
		on			ae.subjectid = lb_u.subjectid
						
		left join	seretide.mh (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as mh
									    
		on			ae.subjectid = mh.subjectid
						
		left join	seretide.pe (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as pe
									    
		on			ae.subjectid = pe.subjectid
						
		left join	seretide.re (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as re
									    
		on			ae.subjectid = re.subjectid
						
		left join	seretide.sv (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as sv
									    
		on			ae.subjectid = sv.subjectid
						
		left join	seretide.vs (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as vs
									    
		on			ae.subjectid = vs.subjectid
		
		order by	subjectid;
quit;
Super User
Posts: 23,323

Re: Joining multiple tables

@krisraa Not sure if the forum garbled that post, but the code is illegible as posted.

Contributor
Posts: 43

Re: Joining multiple tables

Reeza,

 

Hope this is readable.

 

proc sql;

	create table	strans.pprofile as
	
		select		ae.* , ag.*, cm.*, di_ri.*, di_tr.*, dm.*, ds.*, ds_ic.*, ds_t.* 
					, dv.*, eg.*, en.*, ex.*, fa.*, ie.*, lb.*, lb_c.*, lb_u.*,
					mh.*, pe.*, re.*, sv.*
		
		from		seretide.ae (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit
								 rename =(VarGroup1Row = ae_vg1row)) as ae
									    
		left join	seretide.ag (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ag
									    
		on			ae.subjectid = ag.subjectid
		
		left join	seretide.cm (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit
									    rename =(VarGroup1Row = cm_vg1row))as cm									    
		on			ae.subjectid = cm.subjectid
		
		left join	seretide.di_ri (where = (subjectid='0001') 
								 	drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as di_ri
									    
		on			ae.subjectid = di_ri.subjectid
		
		left join	seretide.di_tr (where = (subjectid='0001') 
								 	drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as di_tr
									    
		on			ae.subjectid = di_tr.subjectid
		
		left join	seretide.dm (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as dm
									    
		on			ae.subjectid = dm.subjectid
		
		left join	seretide.ds (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ds
									    
		on			ae.subjectid = ds.subjectid
		
		left join	seretide.ds_ic (where = (subjectid='0001') 
								 	drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ds_ic
									    
		on			ae.subjectid = ds_ic.subjectid
				
		left join	seretide.ds_t (where = (subjectid='0001') 
								   drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ds_t
									    
		on			ae.subjectid = ds_t.subjectid
			
		left join	seretide.dv (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as dv
									    
		on			ae.subjectid = dv.subjectid
					
		left join	seretide.eg (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as eg
									    
		on			ae.subjectid = eg.subjectid
						
		left join	seretide.en (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as en
									    
		on			ae.subjectid = en.subjectid
						
		left join	seretide.ex (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ex
									    
		on			ae.subjectid = ex.subjectid
						
		left join	seretide.fa (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as fa
									    
		on			ae.subjectid = fa.subjectid
						
		left join	seretide.ie (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ie
									    
		on			ae.subjectid = ie.subjectid
						
		left join	seretide.lb (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as lb
									    
		on			ae.subjectid = lb.subjectid
						
		left join	seretide.lb_c (where = (subjectid='0001') 
								   drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as lb_c
									    
		on			ae.subjectid = lb_c.subjectid
						
		left join	seretide.lb_u (where = (subjectid='0001') 
								   drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as lb_u
									    
		on			ae.subjectid = lb_u.subjectid
						
		left join	seretide.mh (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as mh
									    
		on			ae.subjectid = mh.subjectid
						
		left join	seretide.pe (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as pe
									    
		on			ae.subjectid = pe.subjectid
						
		left join	seretide.re (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as re
									    
		on			ae.subjectid = re.subjectid
						
		left join	seretide.sv (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as sv
									    
		on			ae.subjectid = sv.subjectid
						
		left join	seretide.vs (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as vs
									    
		on			ae.subjectid = vs.subjectid
		
		order by	subjectid;
quit;
Super User
Posts: 23,323

Re: Joining multiple tables

Yes, that's legible Smiley Happy

 

I can't see anything in the code that would cause the issue and if you're selecting to a single record it seems like it should be fast. Are all those data sets on a server or the same place, or are they split across a server and say, your desktop? If so, SAS will bring the datasets over to your local are first which will slow things down immensely. 

 

I'd test it by selecting, just the subjectID for now and doing my joins one at a time. Then run it and add each left join in sequentially and see where the problem is originating. 

Contributor
Posts: 43

Re: Joining multiple tables

[ Edited ]

All datasets are in my local folder and I believe it joins to form a cartesian product which is why its running quite long.

 

I need to find a easy way to extract one subject id values from all the datasets joined together as one dataset.

 

 

Super User
Posts: 23,323

Re: Joining multiple tables

It's left joins on a single ID, what makes you think it's a cartesian product?

What does your log say?

 

How many records are you expecting for that single ID?

 

I assume that the data set options are faster than including them in an SQL query but you can always test that as well. 

PROC Star
Posts: 2,319

Re: Joining multiple tables

This  yields the same result and is a lot easier to read and manage if you only have one record for that subjectID.

If you have many records, then yes you'll get a cartesian product.

How many records per subject do you have?

 


data WANT;
  merge SERETIDE.AE (rename =(VARGROUP1ROW = AE_VG1ROW)
        SERETIDE.AG 
        SERETIDE.CM (rename =(VARGROUP1ROW = CM_VG1ROW)
        SERETIDE.DI_RI 
        SERETIDE.DI_TR 
        SERETIDE.DM 
        SERETIDE.DS 
        SERETIDE.DS_IC 
        SERETIDE.DS_T 
        SERETIDE.DV
        SERETIDE.EG 
        SERETIDE.EN
        SERETIDE.EX 
        SERETIDE.FA 
        SERETIDE.IE 
        SERETIDE.LB 
        SERETIDE.LB_C
        SERETIDE.LB_U 
        SERETIDE.MH 
        SERETIDE.PE 
        SERETIDE.RE 
        SERETIDE.SV
        SERETIDE.VS ;
  where SUBJECTID='0001';
  drop  SUBJECTVISITFORMID
        FORM
        FORMENTRYDATE
        SUBJECTSTATUS
        VISIT;
run;
Contributor
Posts: 43

Re: Joining multiple tables

Thanks Chris for optimizing.

 

Yes, subjectid has multiple records.

 

on average each data set contains 60 observations.

 

Now I need to find a better way to join and pull data for each subjectid.

PROC Star
Posts: 2,319

Re: Joining multiple tables

60 records in 23 tables with no join key?  

 

How big is 60^23?  Smiley Wink

 

You need to think about what output you want. Do you know?

Super User
Posts: 23,323

Re: Joining multiple tables

How many columns are you bringing in from each of the joined tables? Do all tables have multiple records for the ID?

Contributor
Posts: 43

Re: Joining multiple tables

205 columns in total.

Super User
Posts: 23,323

Re: Joining multiple tables

If this is a fact table rebuild - ie the tables were split and you're recombining them sometimes you can use a FORMAT instead.

 

With the amount of columns and a many to many merge I'm not sure there's any optimization. And as indicated, 60^23 is a big number...to big to type out and likely you're computer won't process it. You need to find a different way to restrict your query, maybe dates, or some other key to join to some tables? I can't believe a table of that size would be useful either. 

 

 

Trusted Advisor
Posts: 1,312

Re: Joining multiple tables

@ChrisNZ:

 

I don't see how your program would produce a cartesian cross, except when each incoming dataset has exactly one observation with SUBJECTID='0001'.  And that would be the trivial cartesian cross of one output record.

 

But if no dataset has dupllicate SUBJECTIDs, then yes, your program is a lot simpler in construction,

Contributor
Posts: 43

Re: Joining multiple tables

The subject id has multiple observations in different data sets.

Trusted Advisor
Posts: 1,312

Re: Joining multiple tables

You said an average of 60 records per id in each table?'

 

Joining 8 tables on subjectid with 60 qualifying records per file would generate 167,961,600,000,000 records.  23 tables would be that number almost cubed.  Not going to happen.

 

Do you really need a cartesian product of 23 files with 60 qualifying records in each?

 

Ask a Question
Discussion stats
  • 16 replies
  • 269 views
  • 3 likes
  • 5 in conversation