BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
xliu1
Quartz | Level 8

I need to join multiple tables from different libraries in sas. here is the code I use to connect three tables. SID and Prod are two different libraries in SAS. The problem is when I click on run, the software stops working. The software works fine when I run programs that have tables from the same library. I want to make sure the way I join tables are correct. I use different login credentials for these two libraries.


from SID.SID_Person_Demo P inner join SID.SID_Enrollments_All_Vw E on P.UID=E.UID left join Prod.Spraddr A on E.OASIS_PIDM=A.SPRADDR_PIDM

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
50,000 rows wouldn't be considered large in SAS tables at all. If you were doing tens of million is when I'd get concerned. If its only 50,000 rows you should be fine.

View solution in original post

12 REPLIES 12
DavePrinsloo
Pyrite | Level 9
Are these non SAS tables, I.e in Oracle or some other database? If so, you are doing what is called a cross-sever join. You may consider using pass-through to join the tables in the background database server. In that case, you would need that one of the Oracle schemes be granted access to the other.
Reeza
Super User
It's hard to say without seeing the rest of your query. Are you using SQL Pass-through, RSUBMIT, Regular SQL, FEDSQL????

Either way, assuming you've set up the libnames with two libname statements and are using regular PROC SQL, my guess is these are large tables. When SAS is joining these tables behind the scenes, it actually has to go download all the data first. This can be quite cumbersome from a timing perspective so often I'll filter my data sets to just the minimum set of variables and rows, extract them first and then join within SAS. Ultimately it depends on how often your'e doing this and what type of performance you need.
xliu1
Quartz | Level 8

Thanks for your prompt response. I use regular sql procedure in sas. I get your points. These are very large tables that have 50000 rows. I will try your way of connecting those datasets. Thanks.

Reeza
Super User
50,000 rows wouldn't be considered large in SAS tables at all. If you were doing tens of million is when I'd get concerned. If its only 50,000 rows you should be fine.
xliu1
Quartz | Level 8
proc sql outobs=500;
    create table NPSAS20_Enrollment as
	    select distinct P.UID label='Student ID',
         P.NAME_FIRST label='First Name',
		 P.NAME_MI label='Middle Name',
		 P.NAME_LAST label='Last Name',
		 P.SUFFIX label='Name suffix',
		 P.GENDER label='Sex',
         case when E.Military_Status in ('A','N') then 2
		      when E.Military_Status in ('E','R','V','Y') then 1
			  when E.Military_Status = 'D' then 0
			  when E.Military_Status is null then 0
		      else -1 end label='Veteran or Active Duty Military status',
		 case when E.DUAL_ENROLL_IND='N' then 0
              when E.DUAL_ENROLL_IND in ('O','P','S','V') then 1
              else -1 end as Dual label='Dual enrollment indicator',
		 substr(P.BIRTH_YYYYMM_DT, 5, 2) as MonthofBirth label='Date of birth month',
		 substr(P.BIRTH_YYYYMM_DT, 7, 2) as DateofBirth label='Date of birth day',
		 substr(P.BIRTH_YYYYMM_DT, 1, 4) as YearofBirth label='Date of birth year',
		 case when P.HISPANIC_FLG = 'Y' then 1
		      when P.HISPANIC_FLG = 'N' then 0
			  else -1 end as Ethnicty label='Ethnicity',
         case when P.WHITE_FLG = 'Y' then 1
              when P.WHITE_FLG = 'N' then 0
			  else -1 end as White label='Race:White',
		 case when P.BLACK_AFRICAN_AM_FLG = 'Y' then 1
              when P.BLACK_AFRICAN_AM_FLG = 'N' then 0
			  else -1 end as Black label='Race:Black or African American',
         case when P.ASIAN_FLG = 'Y' then 1
              when P.ASIAN_FLG = 'N' then 0
			  else -1 end as Asian label='Race:Asian',
         case when P.AM_IND_ALASKAN_FLG = 'Y' then 1
              when P.AM_IND_ALASKAN_FLG = 'N' then 0
			  else -1 end as AmericanIndian label='Race:American Indian or Alaska Native',
         case when P.NAT_HAWAII_PAC_IS_FLG = 'Y' then 1
              when P.NAT_HAWAII_PAC_IS_FLG = 'N' then 0
			  else -1 end as NativeHawaiian label='Race:Native Hawaiian or Pacific Islander',
	     case when A.SPRADDR_ATYP_CODE = 'PM' then A.SPRADDR_STREET_LINE1 else '' end as Addressline1 label='Permanent address 1' 

        from SID.SID_Person_Demo P
        inner join SID.SID_Enrollments_All_Vw E on P.UID=E.UID
        left join Prod.Spraddr A on E.OASIS_PIDM=A.SPRADDR_PIDM

Where E.TERMID IN ('201905','201908','202001')
          and E.BENCH='B'; 
               
		
	select * from NPSAS20_Enrollment;
QUIT;

Here are my codes. Three tables are used. The program works fine until I joined Prod.Spraddr into the program.

Reeza
Super User
case when E.Military_Status in ('A','N') then 2
		      when E.Military_Status in ('E','R','V','Y') then 1
			  when E.Military_Status = 'D' then 0
			  when E.Military_Status is null then 0
		      else -1 end label='Veteran or Active Duty Military status',

Missing an as there?

No variable name?

 

Also, if you're recoding a bunch of Y/N to a consistent code, ie 0/1/-1 then I would suggest using an informat instead. It's cleaner and you only have to change it in one place. I'd also recommend leaving the -1 as missing, that way you can do summary stats on the column without any issues. If you have multiple missing values, use a special missing.

 

For testing purposes, I'd split the join and data extract from the new variable creation to ensure that I knew which was taking time and worth making more efficient. My suggestions above are faster for you as a programmer but likely won't affect your run time that significantly except the informat portion. 

 

 

xliu1
Quartz | Level 8

Thanks for your response. I am not very familiar with informat coding. Would you please send me an example?

 

Also I want to clarify the recommendation you mentioned in your previous response about "extract them first and then join within SAS". Do you mean I create tables using the same library, and then use "data merge" function to combine them later? Thanks for your help!

 

Reeza
Super User

Here's a paper that covers informats and their usages:

https://www.lexjansen.com/pharmasug/2005/posters/po06.pdf

 

For data extracts, I mean I'd pull down parts of each table manually via a sql or data step, filtering out rows and columns as much as possible.

 

And then do a SQL merge after, you can use SQL on your SAS data sets or a data step merge, they're usually equivalent approaches. 

xliu1
Quartz | Level 8

Thanks for shared information!

DavePrinsloo
Pyrite | Level 9
On the concept of "extract them first": The simplest way is to use proc copy to copy the entire tables.
You would want to do that with SID.SID_Person_Demo and Prod.Spraddr.
Then use a data step or proc sql to copy the data from SID.SID_Enrollments_All_Vw
with Where TERMID IN ('201905','201908','202001') and BENCH='B';

Since all the data is now in SAS, the join should be fast.
xliu1
Quartz | Level 8

Thanks for your response!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1637 views
  • 1 like
  • 3 in conversation