BookmarkSubscribeRSS Feed
Solph
Pyrite | Level 9

Hi,

Can I have inner join and left join in one step in PROC SQL, or I've to do in 2 steps or do with subquery?

Here is an example.

1. I want ID in both tables A and B (so just ID=1 and 2), so it's inner join (in data step it's in A and B)

2. I then want to link the cases to table C to get their breakfast category. Since it's likely the category in C is not complete, I'd need to use left join to keep records (i.e. mushrrom) from step 1 (in data step it's in A).

data a; input id x1;

datalines;

1 11

2 22

;

data b; input id x2 breakfast $;

datalines;

1 1 apple

2 1 carrot

2 1 mushroom

3 1 pepper

;

data c; input breakfast $ category $;

datalines;

apple  fruit

orange fruit

carrot  vegetable

pepper vegetable

;

Data wanted:

ID  breakfast    category

1    apple           fruit

2    carrot          vegetable

2    mushroom    .

SAS code in 2 steps:

proc sql; create table d as

    select  a.id, b.breakfast   

    from a as a, b as b

    where a.id=b.id;

    select  d.id, d.breakfast, c.category   

    from d as d left join c as c

    on d.breakfast=c.breakfast;

quit;

proc print data=want noobs; run;

A related question is can I do inner join (in A and B) and in A but not C in one step? I read that EXCEPT can do the job, but when I copied the example code, it didn't seem right. For the fruit example above, the data set should look like

  ID  breakfast    category

   2   mushroom  .

Many thanks in advance.

10 REPLIES 10
LinusH
Tourmaline | Level 20

It seems that you confuse left join with except logic, which do you want?

And yes, you can nestle joins in the same SQL step, but I recommend that you you explicit join syntax (i.e. a inner join b on...) to avoid unexpected results.

Data never sleeps
Florent
Quartz | Level 8

Hi,

Would the following piece of code do what you are expecting ?

proc sql;

create table d as

    select  a.id

    ,          b.breakfast

    ,          c.category

    from a

    inner join b

         on a.id = b.id

    left join c

         on b.breakfast = c.breakfast;

quit;

data e;

     set d (where=(missing(category)));

run;

Kind regards,

Florent

NagendraKumarK
Calcite | Level 5

proc sql; create table res as ( select x.id,x.breakfast,c.category from (select a.id,  b.breakfast from a join b on a.id=b.id) x left join c on x.breakfast=c.breakfast ); run;

Solph
Pyrite | Level 9

Thanks NagendraKumarK for the subquery code. And thanks especially Florent for the code, which is exactly what I want, for my first question.

Just a question though, if it possible to do If A and B on ID, and If A but not C on breakfast in one code? I see Florent's code in two steps. Possibly in one step.

LinusH, sorry my questions weren't phrased clearly. It's two part questions. First how do I do inner join and left join in one code (I've never seen sample code as Florent's, so it really helped). Second, how do I do inner join (If A and B) and If A but not C in one code. Florent's two-step code is great. But I'm wondering it it can be done in one step.

Florent
Quartz | Level 8

The first step (table d) contains everything which is in A and B, and potentially in C. When there is no match with the table C then the content of the category field will be empty.

If you want to have table e created at the same time as table d (so in one step), then you should use make a dastep using the merge statement and play with the inA, inB, inC variable you can create at that time. The disadvantage of making a datastep instead of a proc sql is that it requires you firstly sort tables a, b and c in the same order... so in the end you would have 3 sort statements + 1 datastep. There is no way to create two datasets at a time with a "create table" statement of the proc sql.

I hope it answers to your questions.

LinusH
Tourmaline | Level 20

Second question, just combine an inner join with a sub-query (not in() ).

Data never sleeps
Solph
Pyrite | Level 9

Thanks for the replies. I've a related question still about doing various merges, hoping to find NO NEED to do sub-queries. The example above is a simpler one as it is always using A to inner join with B, and to left join with C. But the example below doesn't always start with A. I get the feeling that sub-queres are the only way to go in SQL, or use data step. Just want to confirm.

data a; input patient_id hospital_id los;

datalines;

1 100 1

1 102 2

2 100 4

3 101 5

3 103 10

;

data b; input patient_id hospital_id wtcases;

datalines;

1 100 1.5

1 102 2.2

2 100 1.1

3 103 4.1

;

data c; input hospital_id postcode $;

datalines;

100 58823

101 45000

102 45001

;

data d; input postcode $ area;

datalines;

58823  51

45000  52

99999  99

;

*The code below actualy wouldn't run uless I take out what's after SELECT, and use just SELECT *. Then I see it's outputting all combinations and realized the FROM statement isn't doing what I thought SQL could do.

Proc sql; create table want as

    select a.id, a.los, b.hospital_id, b.wtcases, c.postcode, d.area

    from a inner join b on a.patient_id=b.patient_id and a.hospital_id=b.hospital_id,

         b left join c on b.hospital_id=c.hospital_id,

         c left join d on c.postcode=d.postcode;

quit;

proc print data=a noobs; run;

proc print data=b noobs; run;

proc print data=c noobs; run;

proc print data=d noobs; run;

proc print data=want; run;.

Florent
Quartz | Level 8

When using the "inner join"/"left join"/"right join" syntax, you cannot put a comma after the join condition(s). Furthermore, you don't have to repeat the table names which have already been used in a join. With your code, it gives:

Proc sql;

    create table want as

    select a.id, a.los, b.hospital_id, b.wtcases, c.postcode, d.area

    from a

    inner join b

           on a.patient_id = b.patient_id

         and a.hospital_id = b.hospital_id

    left join c

           on b.hospital_id = c.hospital_id

    left join d

           on c.postcode=d.postcode;

quit;

proc print data=a noobs; run;

proc print data=b noobs; run;

proc print data=c noobs; run;

proc print data=d noobs; run;

proc print data=want; run;

Solph
Pyrite | Level 9

Thanks a million. SQL is indeed very powerful. The code worked for a large part. What it didn't work is the following. If you have answers for it, let me know. Otherwise I'll change to answered.

1. I need to change from

     SELECT a.id, a.los, b.hospital_id, b.wtcases, c.postcode, d.area

     to

     SELECT *

     otherwise it's giving an error message "ERROR: Column id could not be found in the table/view identified with the correlation name A" even though there is nothing wrong.

2. It's outputting the following cases, I thought inner join A and B by patient ID and Hospital ID would get rid of patient ID=3  all together. But it got rid of the one from hospital 101 and kept the one from hospital 103.

patient_idhospital_idloswtcasespostcodearea
3103104.1.
110222.245001.
210041.15882351
110011.55882351

Anyway thanks very much for the reply. I didn't know I could use SQL this way. Let me know if there are answers to the above questions. Otherwise I'll close this discussion.

Florent
Quartz | Level 8

Hi,


Concerning the first point, it is clear that the "ID" is not available in table A. It should be replaced by "patient_id".

For the second bullet point, I do not agree with the fact that the line concerning hospital 103 and patient 3 should be excluded from the table. If you look at tables A and B, you will find this patient-hospital combination in both of them. Thus this is logical to find it back in your final table when using inner joins.

I hope it helps.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 5689 views
  • 1 like
  • 4 in conversation