BookmarkSubscribeRSS Feed
Dsquared
Fluorite | Level 6

I am using proc sql to combine datasets.  What is the theoretical difference (if any) in using these two codes with respect to the patient IDs kept?  Do they both only keep the patient IDs from the first dataset and match to the identical patient IDs in the second?  The goal of both is to use the a dataset and then only obtain the files with the same patient ID showing up in b.

 

proc sql;

create table mart_TS.missmedical as

select a.patid, a.index_dt, a.preg_dt, b.*

from mart_TS.outcomemiss a, dbname.clinical(keep=patid eventdate medcode) b

where a.patid=b.patid

order by a.patid, b.medcode;

quit;

 

 

 

proc sql;

create table mart_TS.outcomemiss as

select *

from mart_tmp.ptlist3bdedup where patid not in (select distinct patid from mart_tmp.ptlist4bdedup);

quit;

 

Thanks!!!

 

4 REPLIES 4
Kurt_Bremser
Super User

Try it and compare the results. See Maxim 4.

If the results are equal, check real and cpu times in the log to determine which method is more efficient.

Jeremy_Browne
Fluorite | Level 6
proc sql;
	Select a.id, a.var2,  b.*
		From tablea a, tableb b
			Where a.id = b.id;

Above performs an ‘Inner join’ of all records in a and b, matched on id, with columns in select list from both tables and potentially multiple rows for a single id from tablea if multiple matches in tableb (i.e. one to many relationship)

proc sql;
	Select * 
		From tablea 
			where id in 
				(select distinct id from tableb);

Whereas, latter returns all records from tablea with an id that exists in tableb (or the complement, i.e. returns if doesn’t exist, if NOT IN used).  WHERE IN (sub-query) does not join the tables and you cannot select other variables from tableb in your select list.  Also, if you have a one to many relationship between the two tables, using a sub-query will only return one row per record from tablea.

PGStats
Opal | Level 21

The IDs from the first query are a subset of the IDs kept by the second query since the first query selects only IDs from the second query that are also present in table dbname.clinical. 

PG
PGStats
Opal | Level 21

Note, the first query might generate more records than you expect if there are multiple records per ID in dbname.clinical.

 

The second query doesn't have that problem (even without the distinct keyword) but it doesn't allow you to bring in fields from the second table.

PG

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 889 views
  • 5 likes
  • 4 in conversation