- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all! I'm wondering if it is possible to tell SAS to include more than one WHERE clause in a PROC SQL, and if so, how? Here is what I attempted to do. I have 2 datasets, NOC_TRY4 and NOC_75. NOC_75 is a list of OCCGROUPS and CODES that I want to be included in the original dataset NOC_TRY4 - that is, any CODE within an OCCGROUP that is not included in the list NOC_75 should be deleted.
PROCSQL;
CREATETABLE GOODDATA AS
SELECT * FROM NOC_TRY4, NOC_75
WHERE NOC_75.OCCGROUP=NOC_TRY4.OCCGROUP
WHERE NOC_75.CODE=NOC_TRY4.CODE;
QUIT;
SAS returns this error in the logs:
1413 PROC SQL;
1414 CREATE TABLE NICETRY AS
1415 SELECT * FROM NOC_TRY4, NOC_75
1416 WHERE NOC_75.OCCGROUP=NOC_TRY4.OCCGROUP
1417 WHERE NOC_75.CODE=NOC_TRY4.CODE;
-----
22
76
ERROR
22-322: Syntax error, expecting one of the following: ;, !, !!, &, *, **,
+, -, /, <, <=,
<>, =, >, >=, AND,
EQ, EQT, EXCEPT, GE, GET, GROUP, GT, GTT, HAVING, INTERSECT, LE,
LET, LT, LTT, NE, NET, NOT, OR,
ORDER, OUTER, UNION, ^, ^=, |, ||, ~, ~=.
ERROR
76-322: Syntax error, statement will be ignored.
1418 QUIT;
NOTE:
The SAS System stopped processing this step because of errors.
NOTE:
PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Any help or tips would be greatly appreciated! Thank you!
(Please see my response to Reeza below for an update of the problem I keep running into)
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
grande44 wrote:
But shouldn't I be left with only 1051 observations if there are only 1051 Codes in NOC_75?
Not if there are duplicate keys in your data set, where the keys in your case are occgroup and code.
Try the following code to extract duplicates and unique records from NOC_75 and you can run the same thing for your other data set.
proc sort
data = NOC_75
out = duplicates_75
uniqueout = singles_75
nouniquekey;
by occgroup code;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not multiple WHERE statements, but multiple conditions in a single WHERE statement. It looks like SAS is already suggesting the answer. Replace the second "WHERE" with "AND". See if that does what you need.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Use AND to join the WHERE conditions
I would recommend explicitly listing your join type as well as using ON instead of where in this particular instance - its faster processing.
PROC SQL;
CREATETABLE GOODDATA AS
SELECT *
FROM NOC_TRY4
full join NOC_75
on NOC_75.OCCGROUP=NOC_TRY4.OCCGROUP
AND NOC_75.CODE=NOC_TRY4.CODE;
QUIT;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your help, everyone. No matter what variation of PROC SQL I use, I run into the same problem.
NOC_75 is a list of 1051 observations of Codes and their corresponding OccGroup. However, every time I use PROC SQL to create a new dataset using NOC_75 and NOC_TRY4, the result has more than 1051 observations. For example, the PROC SQL step suggested by Reeza yields a table with 1693 observations.
But shouldn't I be left with only 1051 observations if there are only 1051 Codes in NOC_75?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In a ProcSQL step do:
select count(1) from <your table name here>;
It will show you how many observations are truly in the table. You will not have a correct solution if you have more observations than available.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thats down to the logic of your join, has nothing to do with what syntax SAS uses. I would suggest you read up on how joins work and what they return. Here is an annotated display of each type of join and a visual of what they return.
I would suggest from what you have given then you want the second on the left.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
grande44 wrote:
But shouldn't I be left with only 1051 observations if there are only 1051 Codes in NOC_75?
Not if there are duplicate keys in your data set, where the keys in your case are occgroup and code.
Try the following code to extract duplicates and unique records from NOC_75 and you can run the same thing for your other data set.
proc sort
data = NOC_75
out = duplicates_75
uniqueout = singles_75
nouniquekey;
by occgroup code;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Wow, that seems to be the key issue. There are many duplicates in NOC_75 and I had not realized that this would be the case. It turns out I may need to go back to previous steps in my program to account for this possibility. Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sounds more like a left join
PROCSQL;
CREATETABLE GOODDATA AS
SELECT * FROM NOC_TRY4 left join NOC_75
On NOC_75.OCCGROUP=NOC_TRY4.OCCGROUP
and NOC_75.CODE=NOC_TRY4.CODE;
QUIT;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To extract NOC_TRY4 records which are also present in NOC_75, you need a simple inner join between the two tables
PROC SQL;
CREATE TABLE GOODDATA AS
SELECT NOC_TRY4.*
FROM
NOC_TRY4 inner join
NOC_75
on NOC_75.OCCGROUP=NOC_TRY4.OCCGROUP and
NOC_75.CODE=NOC_TRY4.CODE;
QUIT;
PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you are trying to get the NOC_TRY4 for either condition, then you use an "OR"
CREATETABLE GOODDATA AS
SELECT * FROM NOC_TRY4, NOC_75
WHERE NOC_75.OCCGROUP=NOC_TRY4.OCCGROUP OR NOC_75.CODE=NOC_TRY4.CODE;
But if you are using the condition as exclusionary clauses and only want the results when both condition are true, then you use an "AND"
CREATETABLE GOODDATA AS
SELECT * FROM NOC_TRY4, NOC_75
WHERE NOC_75.OCCGROUP=NOC_TRY4.OCCGROUP AND NOC_75.CODE=NOC_TRY4.CODE;