BookmarkSubscribeRSS Feed
Solph
Pyrite | Level 9

Hi,

Can I have inner join and left join in one step in PROC SQL, or I've to do in 2 steps or do with subquery?

Here is an example.

1. I want ID in both tables A and B (so just ID=1 and 2), so it's inner join (in data step it's in A and B)

2. I then want to link the cases to table C to get their breakfast category. Since it's likely the category in C is not complete, I'd need to use left join to keep records (i.e. mushrrom) from step 1 (in data step it's in A).

data a; input id x1;

datalines;

1 11

2 22

;

data b; input id x2 breakfast $;

datalines;

1 1 apple

2 1 carrot

2 1 mushroom

3 1 pepper

;

data c; input breakfast $ category $;

datalines;

apple  fruit

orange fruit

carrot  vegetable

pepper vegetable

;

Data wanted:

ID  breakfast    category

1    apple           fruit

2    carrot          vegetable

2    mushroom    .

SAS code in 2 steps:

proc sql; create table d as

    select  a.id, b.breakfast   

    from a as a, b as b

    where a.id=b.id;

    select  d.id, d.breakfast, c.category   

    from d as d left join c as c

    on d.breakfast=c.breakfast;

quit;

proc print data=want noobs; run;

A related question is can I do inner join (in A and B) and in A but not C in one step? I read that EXCEPT can do the job, but when I copied the example code, it didn't seem right. For the fruit example above, the data set should look like

  ID  breakfast    category

   2   mushroom  .

Many thanks in advance.

10 REPLIES 10
LinusH
Tourmaline | Level 20

It seems that you confuse left join with except logic, which do you want?

And yes, you can nestle joins in the same SQL step, but I recommend that you you explicit join syntax (i.e. a inner join b on...) to avoid unexpected results.

Data never sleeps
Florent
Quartz | Level 8

Hi,

Would the following piece of code do what you are expecting ?

proc sql;

create table d as

    select  a.id

    ,          b.breakfast

    ,          c.category

    from a

    inner join b

         on a.id = b.id

    left join c

         on b.breakfast = c.breakfast;

quit;

data e;

     set d (where=(missing(category)));

run;

Kind regards,

Florent

NagendraKumarK
Calcite | Level 5

proc sql; create table res as ( select x.id,x.breakfast,c.category from (select a.id,  b.breakfast from a join b on a.id=b.id) x left join c on x.breakfast=c.breakfast ); run;

Solph
Pyrite | Level 9

Thanks NagendraKumarK for the subquery code. And thanks especially Florent for the code, which is exactly what I want, for my first question.

Just a question though, if it possible to do If A and B on ID, and If A but not C on breakfast in one code? I see Florent's code in two steps. Possibly in one step.

LinusH, sorry my questions weren't phrased clearly. It's two part questions. First how do I do inner join and left join in one code (I've never seen sample code as Florent's, so it really helped). Second, how do I do inner join (If A and B) and If A but not C in one code. Florent's two-step code is great. But I'm wondering it it can be done in one step.

Florent
Quartz | Level 8

The first step (table d) contains everything which is in A and B, and potentially in C. When there is no match with the table C then the content of the category field will be empty.

If you want to have table e created at the same time as table d (so in one step), then you should use make a dastep using the merge statement and play with the inA, inB, inC variable you can create at that time. The disadvantage of making a datastep instead of a proc sql is that it requires you firstly sort tables a, b and c in the same order... so in the end you would have 3 sort statements + 1 datastep. There is no way to create two datasets at a time with a "create table" statement of the proc sql.

I hope it answers to your questions.

LinusH
Tourmaline | Level 20

Second question, just combine an inner join with a sub-query (not in() ).

Data never sleeps
Solph
Pyrite | Level 9

Thanks for the replies. I've a related question still about doing various merges, hoping to find NO NEED to do sub-queries. The example above is a simpler one as it is always using A to inner join with B, and to left join with C. But the example below doesn't always start with A. I get the feeling that sub-queres are the only way to go in SQL, or use data step. Just want to confirm.

data a; input patient_id hospital_id los;

datalines;

1 100 1

1 102 2

2 100 4

3 101 5

3 103 10

;

data b; input patient_id hospital_id wtcases;

datalines;

1 100 1.5

1 102 2.2

2 100 1.1

3 103 4.1

;

data c; input hospital_id postcode $;

datalines;

100 58823

101 45000

102 45001

;

data d; input postcode $ area;

datalines;

58823  51

45000  52

99999  99

;

*The code below actualy wouldn't run uless I take out what's after SELECT, and use just SELECT *. Then I see it's outputting all combinations and realized the FROM statement isn't doing what I thought SQL could do.

Proc sql; create table want as

    select a.id, a.los, b.hospital_id, b.wtcases, c.postcode, d.area

    from a inner join b on a.patient_id=b.patient_id and a.hospital_id=b.hospital_id,

         b left join c on b.hospital_id=c.hospital_id,

         c left join d on c.postcode=d.postcode;

quit;

proc print data=a noobs; run;

proc print data=b noobs; run;

proc print data=c noobs; run;

proc print data=d noobs; run;

proc print data=want; run;.

Florent
Quartz | Level 8

When using the "inner join"/"left join"/"right join" syntax, you cannot put a comma after the join condition(s). Furthermore, you don't have to repeat the table names which have already been used in a join. With your code, it gives:

Proc sql;

    create table want as

    select a.id, a.los, b.hospital_id, b.wtcases, c.postcode, d.area

    from a

    inner join b

           on a.patient_id = b.patient_id

         and a.hospital_id = b.hospital_id

    left join c

           on b.hospital_id = c.hospital_id

    left join d

           on c.postcode=d.postcode;

quit;

proc print data=a noobs; run;

proc print data=b noobs; run;

proc print data=c noobs; run;

proc print data=d noobs; run;

proc print data=want; run;

Solph
Pyrite | Level 9

Thanks a million. SQL is indeed very powerful. The code worked for a large part. What it didn't work is the following. If you have answers for it, let me know. Otherwise I'll change to answered.

1. I need to change from

     SELECT a.id, a.los, b.hospital_id, b.wtcases, c.postcode, d.area

     to

     SELECT *

     otherwise it's giving an error message "ERROR: Column id could not be found in the table/view identified with the correlation name A" even though there is nothing wrong.

2. It's outputting the following cases, I thought inner join A and B by patient ID and Hospital ID would get rid of patient ID=3  all together. But it got rid of the one from hospital 101 and kept the one from hospital 103.

patient_idhospital_idloswtcasespostcodearea
3103104.1.
110222.245001.
210041.15882351
110011.55882351

Anyway thanks very much for the reply. I didn't know I could use SQL this way. Let me know if there are answers to the above questions. Otherwise I'll close this discussion.

Florent
Quartz | Level 8

Hi,


Concerning the first point, it is clear that the "ID" is not available in table A. It should be replaced by "patient_id".

For the second bullet point, I do not agree with the fact that the line concerning hospital 103 and patient 3 should be excluded from the table. If you look at tables A and B, you will find this patient-hospital combination in both of them. Thus this is logical to find it back in your final table when using inner joins.

I hope it helps.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 6633 views
  • 1 like
  • 4 in conversation