This is probably an easy question for people accustomed to SQL programming, but, alas, I am not.
Have
Want
Linked: Control, Status87, Status89, weight no other variables
In other words, I want to link AHS87 and AHS89 by matching on Control. I keep the Status variable from each dataset and drop everything else except the weight from AHS89.
I could do this in a DATA step easily enough, but I'm trying to understand how to do it using PROC SQL.
Thanks.
--Dav
Here is an example similar to your specs:
data ahs87; input control status var1-var3; cards; 1 1 2 3 4 2 2 4 5 6 3 3 5 6 7 ; data ahs89; input control status weight var5-var7; cards; 1 9 6 2 3 4 2 8 5 4 5 6 3 7 4 5 6 7 ; proc sql noprint; create table want as select a.control,a.status as status87, b.status as status89, b.weight from ahs87 a join ahs89 b on a.control eq b.control ; quit;
Art, CEO, AnalystFinder.com
Here is an example similar to your specs:
data ahs87; input control status var1-var3; cards; 1 1 2 3 4 2 2 4 5 6 3 3 5 6 7 ; data ahs89; input control status weight var5-var7; cards; 1 9 6 2 3 4 2 8 5 4 5 6 3 7 4 5 6 7 ; proc sql noprint; create table want as select a.control,a.status as status87, b.status as status89, b.weight from ahs87 a join ahs89 b on a.control eq b.control ; quit;
Art, CEO, AnalystFinder.com
The type of join to specify depends on how you want non-matched records to be treated.
A simple example
proc sql;
create table want as
select ahs87.control, ahs87.status as ahs87status, ahs89.status as ahs89status, ahs89.weight
from ahs87 left join ahs89
on ahs87.control = ahs89.control
;
quit;
Keeps all of the records from ahs87 and only brings in the matches from ahs89. I renamed the status variables with a new name to indicate which data set contributed which value. Notice the notation of dataset.variable, that . is IMPORTANT as are the , between variables in on select clause.
Unlike a data step merge with BY the data sets involved do not need to be sorted by the matching varialbes and the variables used in the in ON clause do not have to have the same name. If the value in ahs89 that should correspond to Control in Ahs87 was name Con_89 you could use
on ahs87.control = ah89.con_89
for the match.
Unlike the data step the matching variables
Thank you for your help. As it happened, I wanted to keep non-matching cases, as the fact that they don't match tells me something. I used a full join to accomplish that.
--Dav
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.