- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Can I have inner join and left join in one step in PROC SQL, or I've to do in 2 steps or do with subquery?
Here is an example.
1. I want ID in both tables A and B (so just ID=1 and 2), so it's inner join (in data step it's in A and B)
2. I then want to link the cases to table C to get their breakfast category. Since it's likely the category in C is not complete, I'd need to use left join to keep records (i.e. mushrrom) from step 1 (in data step it's in A).
data a; input id x1;
datalines;
1 11
2 22
;
data b; input id x2 breakfast $;
datalines;
1 1 apple
2 1 carrot
2 1 mushroom
3 1 pepper
;
data c; input breakfast $ category $;
datalines;
apple fruit
orange fruit
carrot vegetable
pepper vegetable
;
Data wanted:
ID breakfast category
1 apple fruit
2 carrot vegetable
2 mushroom .
SAS code in 2 steps:
proc sql; create table d as
select a.id, b.breakfast
from a as a, b as b
where a.id=b.id;
select d.id, d.breakfast, c.category
from d as d left join c as c
on d.breakfast=c.breakfast;
quit;
proc print data=want noobs; run;
A related question is can I do inner join (in A and B) and in A but not C in one step? I read that EXCEPT can do the job, but when I copied the example code, it didn't seem right. For the fruit example above, the data set should look like
ID breakfast category
2 mushroom .
Many thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It seems that you confuse left join with except logic, which do you want?
And yes, you can nestle joins in the same SQL step, but I recommend that you you explicit join syntax (i.e. a inner join b on...) to avoid unexpected results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Would the following piece of code do what you are expecting ?
proc sql;
create table d as
select a.id
, b.breakfast
, c.category
from a
inner join b
on a.id = b.id
left join c
on b.breakfast = c.breakfast;
quit;
data e;
set d (where=(missing(category)));
run;
Kind regards,
Florent
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc sql; create table res as ( select x.id,x.breakfast,c.category from (select a.id, b.breakfast from a join b on a.id=b.id) x left join c on x.breakfast=c.breakfast ); run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks NagendraKumarK for the subquery code. And thanks especially Florent for the code, which is exactly what I want, for my first question.
Just a question though, if it possible to do If A and B on ID, and If A but not C on breakfast in one code? I see Florent's code in two steps. Possibly in one step.
LinusH, sorry my questions weren't phrased clearly. It's two part questions. First how do I do inner join and left join in one code (I've never seen sample code as Florent's, so it really helped). Second, how do I do inner join (If A and B) and If A but not C in one code. Florent's two-step code is great. But I'm wondering it it can be done in one step.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The first step (table d) contains everything which is in A and B, and potentially in C. When there is no match with the table C then the content of the category field will be empty.
If you want to have table e created at the same time as table d (so in one step), then you should use make a dastep using the merge statement and play with the inA, inB, inC variable you can create at that time. The disadvantage of making a datastep instead of a proc sql is that it requires you firstly sort tables a, b and c in the same order... so in the end you would have 3 sort statements + 1 datastep. There is no way to create two datasets at a time with a "create table" statement of the proc sql.
I hope it answers to your questions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Second question, just combine an inner join with a sub-query (not in() ).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the replies. I've a related question still about doing various merges, hoping to find NO NEED to do sub-queries. The example above is a simpler one as it is always using A to inner join with B, and to left join with C. But the example below doesn't always start with A. I get the feeling that sub-queres are the only way to go in SQL, or use data step. Just want to confirm.
data a; input patient_id hospital_id los;
datalines;
1 100 1
1 102 2
2 100 4
3 101 5
3 103 10
;
data b; input patient_id hospital_id wtcases;
datalines;
1 100 1.5
1 102 2.2
2 100 1.1
3 103 4.1
;
data c; input hospital_id postcode $;
datalines;
100 58823
101 45000
102 45001
;
data d; input postcode $ area;
datalines;
58823 51
45000 52
99999 99
;
*The code below actualy wouldn't run uless I take out what's after SELECT, and use just SELECT *. Then I see it's outputting all combinations and realized the FROM statement isn't doing what I thought SQL could do.
Proc sql; create table want as
select a.id, a.los, b.hospital_id, b.wtcases, c.postcode, d.area
from a inner join b on a.patient_id=b.patient_id and a.hospital_id=b.hospital_id,
b left join c on b.hospital_id=c.hospital_id,
c left join d on c.postcode=d.postcode;
quit;
proc print data=a noobs; run;
proc print data=b noobs; run;
proc print data=c noobs; run;
proc print data=d noobs; run;
proc print data=want; run;.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When using the "inner join"/"left join"/"right join" syntax, you cannot put a comma after the join condition(s). Furthermore, you don't have to repeat the table names which have already been used in a join. With your code, it gives:
Proc sql;
create table want as
select a.id, a.los, b.hospital_id, b.wtcases, c.postcode, d.area
from a
inner join b
on a.patient_id = b.patient_id
and a.hospital_id = b.hospital_id
left join c
on b.hospital_id = c.hospital_id
left join d
on c.postcode=d.postcode;
quit;
proc print data=a noobs; run;
proc print data=b noobs; run;
proc print data=c noobs; run;
proc print data=d noobs; run;
proc print data=want; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a million. SQL is indeed very powerful. The code worked for a large part. What it didn't work is the following. If you have answers for it, let me know. Otherwise I'll change to answered.
1. I need to change from
SELECT a.id, a.los, b.hospital_id, b.wtcases, c.postcode, d.area
to
SELECT *
otherwise it's giving an error message "ERROR: Column id could not be found in the table/view identified with the correlation name A" even though there is nothing wrong.
2. It's outputting the following cases, I thought inner join A and B by patient ID and Hospital ID would get rid of patient ID=3 all together. But it got rid of the one from hospital 101 and kept the one from hospital 103.
patient_id | hospital_id | los | wtcases | postcode | area |
3 | 103 | 10 | 4.1 | . | |
1 | 102 | 2 | 2.2 | 45001 | . |
2 | 100 | 4 | 1.1 | 58823 | 51 |
1 | 100 | 1 | 1.5 | 58823 | 51 |
Anyway thanks very much for the reply. I didn't know I could use SQL this way. Let me know if there are answers to the above questions. Otherwise I'll close this discussion.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Concerning the first point, it is clear that the "ID" is not available in table A. It should be replaced by "patient_id".
For the second bullet point, I do not agree with the fact that the line concerning hospital 103 and patient 3 should be excluded from the table. If you look at tables A and B, you will find this patient-hospital combination in both of them. Thus this is logical to find it back in your final table when using inner joins.
I hope it helps.