Hi,
I am creating a table using proc sql and was wondering if someone could help me to select all of the variables from dataset B in a left join, except for two variables: "RECORD_NUMBER" (I would need to keep RECORD_NUMBER as I will left join on a.RECORD_NUMBER=b.RECORD_NUMBER, but only dataset A's RECORD_NUMBER would be selected to be shown in the table) and "MONTH". The preferred option would be to just exclude these two variables from dataset B, as there are a fair amount of variables that would need to be listed.
proc sql;
create table Final_Information as
select A.*
,B.* /*How to keep all of B's variables, except "B.RECORD_NUMBER" and "B.MONTH"?*/
from AC.TEST as a
left join INITIAL_DATA as b
on a.RECORD_NUMBER=b.RECORD_NUMBER
;
quit;
No need to drop b.RECORD_NUMBER because you already have a.RECORD_NUMBER and you cannot have two variables with the same name. To drop MONTH use DROP= dataset option.
INITIAL_DATA(drop=month) as b
If you want to eliminate messages about duplicate variable names then rename RECORD_NUMBER and then add DROP= dataset option to the output dataset.
create table Final_Information(drop=b_record_number) as
select A.*
,B.*
from AC.TEST as a
left join INITIAL_DATA (drop=month rename=(record_number=b_record_number)) as b
on a.RECORD_NUMBER=b.b_RECORD_NUMBER
;
Formal syntax would be instead of ,b.* to list the variables
,b.variable1, b.variable2, b.variable3 <etc>
The quick and dirty easy data set option Drop on B won't work because one of the variables you want to drop is used in an expression. Do you also have a MONTH variable in the A data set?
Thanks for your reply! I would have typed out the usual b.[variable], b.[variable] etc. but there are a lot of variables in dataset B that I want to join on, so was wondering if there was any code to not select only the "RECORD_NUMBER" and "MONTH" (without dropping it from the INITIAL_DATA dataset). When I write A.*, there is the "MONTH" variable in dataset A already (which is the only one I want to show in the final table) and a warning appears on my log saying that it's already in the table, so that's why the preference would be to simply not select the two variables ("RECORD_NUMBER" and "MONTH"), but not have to list out every variable I want in dataset B (as there are quite a lot).
proc sql;
create table Final_Information as
select A.*
,B.* /*How to keep all of B's variables, except "B.RECORD_NUMBER" and "B.MONTH"?*/
from AC.TEST as a
left join INITIAL_DATA (drop=month) as b
on a.RECORD_NUMBER=b.RECORD_NUMBER
;
quit;
I think this would work but may still give you a note in the log.
No need to drop b.RECORD_NUMBER because you already have a.RECORD_NUMBER and you cannot have two variables with the same name. To drop MONTH use DROP= dataset option.
INITIAL_DATA(drop=month) as b
If you want to eliminate messages about duplicate variable names then rename RECORD_NUMBER and then add DROP= dataset option to the output dataset.
create table Final_Information(drop=b_record_number) as
select A.*
,B.*
from AC.TEST as a
left join INITIAL_DATA (drop=month rename=(record_number=b_record_number)) as b
on a.RECORD_NUMBER=b.b_RECORD_NUMBER
;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.