Hi,
I am creating a table using proc sql and was wondering if someone could help me to select all of the variables from dataset B in a left join, except for two variables: "RECORD_NUMBER" (I would need to keep RECORD_NUMBER as I will left join on a.RECORD_NUMBER=b.RECORD_NUMBER, but only dataset A's RECORD_NUMBER would be selected to be shown in the table) and "MONTH". The preferred option would be to just exclude these two variables from dataset B, as there are a fair amount of variables that would need to be listed.
proc sql;
create table Final_Information as
select A.*
,B.* /*How to keep all of B's variables, except "B.RECORD_NUMBER" and "B.MONTH"?*/
from AC.TEST as a
left join INITIAL_DATA as b
on a.RECORD_NUMBER=b.RECORD_NUMBER
;
quit;
No need to drop b.RECORD_NUMBER because you already have a.RECORD_NUMBER and you cannot have two variables with the same name. To drop MONTH use DROP= dataset option.
INITIAL_DATA(drop=month) as b
If you want to eliminate messages about duplicate variable names then rename RECORD_NUMBER and then add DROP= dataset option to the output dataset.
create table Final_Information(drop=b_record_number) as
select A.*
,B.*
from AC.TEST as a
left join INITIAL_DATA (drop=month rename=(record_number=b_record_number)) as b
on a.RECORD_NUMBER=b.b_RECORD_NUMBER
;
Formal syntax would be instead of ,b.* to list the variables
,b.variable1, b.variable2, b.variable3 <etc>
The quick and dirty easy data set option Drop on B won't work because one of the variables you want to drop is used in an expression. Do you also have a MONTH variable in the A data set?
Thanks for your reply! I would have typed out the usual b.[variable], b.[variable] etc. but there are a lot of variables in dataset B that I want to join on, so was wondering if there was any code to not select only the "RECORD_NUMBER" and "MONTH" (without dropping it from the INITIAL_DATA dataset). When I write A.*, there is the "MONTH" variable in dataset A already (which is the only one I want to show in the final table) and a warning appears on my log saying that it's already in the table, so that's why the preference would be to simply not select the two variables ("RECORD_NUMBER" and "MONTH"), but not have to list out every variable I want in dataset B (as there are quite a lot).
proc sql;
create table Final_Information as
select A.*
,B.* /*How to keep all of B's variables, except "B.RECORD_NUMBER" and "B.MONTH"?*/
from AC.TEST as a
left join INITIAL_DATA (drop=month) as b
on a.RECORD_NUMBER=b.RECORD_NUMBER
;
quit;
I think this would work but may still give you a note in the log.
No need to drop b.RECORD_NUMBER because you already have a.RECORD_NUMBER and you cannot have two variables with the same name. To drop MONTH use DROP= dataset option.
INITIAL_DATA(drop=month) as b
If you want to eliminate messages about duplicate variable names then rename RECORD_NUMBER and then add DROP= dataset option to the output dataset.
create table Final_Information(drop=b_record_number) as
select A.*
,B.*
from AC.TEST as a
left join INITIAL_DATA (drop=month rename=(record_number=b_record_number)) as b
on a.RECORD_NUMBER=b.b_RECORD_NUMBER
;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.