BookmarkSubscribeRSS Feed
krisraa
Quartz | Level 8

I'm trying to join multiple datasets as below but it is running without ending.

 

I'm sure its forming cartesian product but not sure what i'm doing wrong.

 

proc sql;

	create table	strans.pprofile as
	
		select		ae.* , ag.*, cm.*, di_ri.*, di_tr.*, dm.*, ds.*, ds_ic.*, ds_t.* 
					, dv.*, eg.*, en.*, ex.*, fa.*, ie.*, lb.*, lb_c.*, lb_u.*,
					mh.*, pe.*, re.*, sv.*
		
		from		seretide.ae (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit
								 rename =(VarGroup1Row = ae_vg1row)) as ae
									    
		left join	seretide.ag (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ag
									    
		on			ae.subjectid = ag.subjectid
		
		left join	seretide.cm (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit
									    rename =(VarGroup1Row = cm_vg1row))as cm									    
		on			ae.subjectid = cm.subjectid
		
		left join	seretide.di_ri (where = (subjectid='0001') 
								 	drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as di_ri
									    
		on			ae.subjectid = di_ri.subjectid
		
		left join	seretide.di_tr (where = (subjectid='0001') 
								 	drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as di_tr
									    
		on			ae.subjectid = di_tr.subjectid
		
		left join	seretide.dm (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as dm
									    
		on			ae.subjectid = dm.subjectid
		
		left join	seretide.ds (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ds
									    
		on			ae.subjectid = ds.subjectid
		
		left join	seretide.ds_ic (where = (subjectid='0001') 
								 	drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ds_ic
									    
		on			ae.subjectid = ds_ic.subjectid
				
		left join	seretide.ds_t (where = (subjectid='0001') 
								   drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ds_t
									    
		on			ae.subjectid = ds_t.subjectid
			
		left join	seretide.dv (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as dv
									    
		on			ae.subjectid = dv.subjectid
					
		left join	seretide.eg (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as eg
									    
		on			ae.subjectid = eg.subjectid
						
		left join	seretide.en (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as en
									    
		on			ae.subjectid = en.subjectid
						
		left join	seretide.ex (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ex
									    
		on			ae.subjectid = ex.subjectid
						
		left join	seretide.fa (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as fa
									    
		on			ae.subjectid = fa.subjectid
						
		left join	seretide.ie (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ie
									    
		on			ae.subjectid = ie.subjectid
						
		left join	seretide.lb (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as lb
									    
		on			ae.subjectid = lb.subjectid
						
		left join	seretide.lb_c (where = (subjectid='0001') 
								   drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as lb_c
									    
		on			ae.subjectid = lb_c.subjectid
						
		left join	seretide.lb_u (where = (subjectid='0001') 
								   drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as lb_u
									    
		on			ae.subjectid = lb_u.subjectid
						
		left join	seretide.mh (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as mh
									    
		on			ae.subjectid = mh.subjectid
						
		left join	seretide.pe (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as pe
									    
		on			ae.subjectid = pe.subjectid
						
		left join	seretide.re (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as re
									    
		on			ae.subjectid = re.subjectid
						
		left join	seretide.sv (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as sv
									    
		on			ae.subjectid = sv.subjectid
						
		left join	seretide.vs (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as vs
									    
		on			ae.subjectid = vs.subjectid
		
		order by	subjectid;
quit;
16 REPLIES 16
Reeza
Super User

@krisraa Not sure if the forum garbled that post, but the code is illegible as posted.

krisraa
Quartz | Level 8

Reeza,

 

Hope this is readable.

 

proc sql;

	create table	strans.pprofile as
	
		select		ae.* , ag.*, cm.*, di_ri.*, di_tr.*, dm.*, ds.*, ds_ic.*, ds_t.* 
					, dv.*, eg.*, en.*, ex.*, fa.*, ie.*, lb.*, lb_c.*, lb_u.*,
					mh.*, pe.*, re.*, sv.*
		
		from		seretide.ae (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit
								 rename =(VarGroup1Row = ae_vg1row)) as ae
									    
		left join	seretide.ag (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ag
									    
		on			ae.subjectid = ag.subjectid
		
		left join	seretide.cm (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit
									    rename =(VarGroup1Row = cm_vg1row))as cm									    
		on			ae.subjectid = cm.subjectid
		
		left join	seretide.di_ri (where = (subjectid='0001') 
								 	drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as di_ri
									    
		on			ae.subjectid = di_ri.subjectid
		
		left join	seretide.di_tr (where = (subjectid='0001') 
								 	drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as di_tr
									    
		on			ae.subjectid = di_tr.subjectid
		
		left join	seretide.dm (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as dm
									    
		on			ae.subjectid = dm.subjectid
		
		left join	seretide.ds (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ds
									    
		on			ae.subjectid = ds.subjectid
		
		left join	seretide.ds_ic (where = (subjectid='0001') 
								 	drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ds_ic
									    
		on			ae.subjectid = ds_ic.subjectid
				
		left join	seretide.ds_t (where = (subjectid='0001') 
								   drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ds_t
									    
		on			ae.subjectid = ds_t.subjectid
			
		left join	seretide.dv (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as dv
									    
		on			ae.subjectid = dv.subjectid
					
		left join	seretide.eg (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as eg
									    
		on			ae.subjectid = eg.subjectid
						
		left join	seretide.en (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as en
									    
		on			ae.subjectid = en.subjectid
						
		left join	seretide.ex (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ex
									    
		on			ae.subjectid = ex.subjectid
						
		left join	seretide.fa (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as fa
									    
		on			ae.subjectid = fa.subjectid
						
		left join	seretide.ie (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as ie
									    
		on			ae.subjectid = ie.subjectid
						
		left join	seretide.lb (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as lb
									    
		on			ae.subjectid = lb.subjectid
						
		left join	seretide.lb_c (where = (subjectid='0001') 
								   drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as lb_c
									    
		on			ae.subjectid = lb_c.subjectid
						
		left join	seretide.lb_u (where = (subjectid='0001') 
								   drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as lb_u
									    
		on			ae.subjectid = lb_u.subjectid
						
		left join	seretide.mh (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as mh
									    
		on			ae.subjectid = mh.subjectid
						
		left join	seretide.pe (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as pe
									    
		on			ae.subjectid = pe.subjectid
						
		left join	seretide.re (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as re
									    
		on			ae.subjectid = re.subjectid
						
		left join	seretide.sv (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as sv
									    
		on			ae.subjectid = sv.subjectid
						
		left join	seretide.vs (where = (subjectid='0001') 
								 drop = SubjectVisitFormID
									    Form
									    FormEntryDate
									    SubjectStatus
									    Visit)as vs
									    
		on			ae.subjectid = vs.subjectid
		
		order by	subjectid;
quit;
Reeza
Super User

Yes, that's legible 🙂

 

I can't see anything in the code that would cause the issue and if you're selecting to a single record it seems like it should be fast. Are all those data sets on a server or the same place, or are they split across a server and say, your desktop? If so, SAS will bring the datasets over to your local are first which will slow things down immensely. 

 

I'd test it by selecting, just the subjectID for now and doing my joins one at a time. Then run it and add each left join in sequentially and see where the problem is originating. 

krisraa
Quartz | Level 8

All datasets are in my local folder and I believe it joins to form a cartesian product which is why its running quite long.

 

I need to find a easy way to extract one subject id values from all the datasets joined together as one dataset.

 

 

Reeza
Super User

It's left joins on a single ID, what makes you think it's a cartesian product?

What does your log say?

 

How many records are you expecting for that single ID?

 

I assume that the data set options are faster than including them in an SQL query but you can always test that as well. 

ChrisNZ
Tourmaline | Level 20

This  yields the same result and is a lot easier to read and manage if you only have one record for that subjectID.

If you have many records, then yes you'll get a cartesian product.

How many records per subject do you have?

 


data WANT;
  merge SERETIDE.AE (rename =(VARGROUP1ROW = AE_VG1ROW)
        SERETIDE.AG 
        SERETIDE.CM (rename =(VARGROUP1ROW = CM_VG1ROW)
        SERETIDE.DI_RI 
        SERETIDE.DI_TR 
        SERETIDE.DM 
        SERETIDE.DS 
        SERETIDE.DS_IC 
        SERETIDE.DS_T 
        SERETIDE.DV
        SERETIDE.EG 
        SERETIDE.EN
        SERETIDE.EX 
        SERETIDE.FA 
        SERETIDE.IE 
        SERETIDE.LB 
        SERETIDE.LB_C
        SERETIDE.LB_U 
        SERETIDE.MH 
        SERETIDE.PE 
        SERETIDE.RE 
        SERETIDE.SV
        SERETIDE.VS ;
  where SUBJECTID='0001';
  drop  SUBJECTVISITFORMID
        FORM
        FORMENTRYDATE
        SUBJECTSTATUS
        VISIT;
run;
krisraa
Quartz | Level 8

Thanks Chris for optimizing.

 

Yes, subjectid has multiple records.

 

on average each data set contains 60 observations.

 

Now I need to find a better way to join and pull data for each subjectid.

ChrisNZ
Tourmaline | Level 20

60 records in 23 tables with no join key?  

 

How big is 60^23?  😉

 

You need to think about what output you want. Do you know?

Reeza
Super User

How many columns are you bringing in from each of the joined tables? Do all tables have multiple records for the ID?

krisraa
Quartz | Level 8

205 columns in total.

Reeza
Super User

If this is a fact table rebuild - ie the tables were split and you're recombining them sometimes you can use a FORMAT instead.

 

With the amount of columns and a many to many merge I'm not sure there's any optimization. And as indicated, 60^23 is a big number...to big to type out and likely you're computer won't process it. You need to find a different way to restrict your query, maybe dates, or some other key to join to some tables? I can't believe a table of that size would be useful either. 

 

 

mkeintz
PROC Star

@ChrisNZ:

 

I don't see how your program would produce a cartesian cross, except when each incoming dataset has exactly one observation with SUBJECTID='0001'.  And that would be the trivial cartesian cross of one output record.

 

But if no dataset has dupllicate SUBJECTIDs, then yes, your program is a lot simpler in construction,

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
krisraa
Quartz | Level 8

The subject id has multiple observations in different data sets.

mkeintz
PROC Star

You said an average of 60 records per id in each table?'

 

Joining 8 tables on subjectid with 60 qualifying records per file would generate 167,961,600,000,000 records.  23 tables would be that number almost cubed.  Not going to happen.

 

Do you really need a cartesian product of 23 files with 60 qualifying records in each?

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 1642 views
  • 3 likes
  • 5 in conversation