Hi
I am trying to merge two datasets- I posted a similar question a few days ago; however there have been some changes so I am posting it again. I'd like to combine set A and set B. Set B is a subset of set A, and I'd like to have my master dataset to have everyone who is in set B, with all of the variables from set B and some of the variables from set A. Can someone tell me how to do this?
Thank you!
And you will get exactly the same answer you got on the other post, of which I note you haven't responded or marked any answer correct. If you want help then post a good question - post example test data in the form of a datastep, and what you want the output to look like. Make sure your problem is fully explained and the test data contains enough information.
So from your text:
data want; merge a (in=a) b (in=b); by <id_variables>; if b; run;
Note I don't knowwhat your data looks like so cant put in the id variables, and will assume its sorted etc.
One way would look something like
proc sql; create table want as select b.*, a.var1, a.var2 from b left join a on b.id = a.id; quit;
list each variable from a that you want by name with the table name prefacing the variable. IF any of the A variables you want exist in B you'll have to decide what to do as you can only have one variable by that name in the resulting set.
the on clause would list pairs of variable names used with AND between each pair of there are more than one to identify the match.
NOTE: if the matching variables are duplicated in either A or B you'll get on record for each match. So if ID=1 appears twice in B and 3 times in A you would get 6 records with ID=1 in the result.
And you will get exactly the same answer you got on the other post, of which I note you haven't responded or marked any answer correct. If you want help then post a good question - post example test data in the form of a datastep, and what you want the output to look like. Make sure your problem is fully explained and the test data contains enough information.
So from your text:
data want; merge a (in=a) b (in=b); by <id_variables>; if b; run;
Note I don't knowwhat your data looks like so cant put in the id variables, and will assume its sorted etc.
Thanks RW9! Thank you for your comment. I tried what you suggested below and it worked fine. Actually, the tedious part was to create a UID (I used dob first last name etc) and find matching records...Anyway, thank you!
Multiple SAS data sets can be merged based on a specific common variable to give a single data set. This is done using the MERGE statement and BY statement. The total number of observations in the merged data set is often less than the sum of the number of observations in the original data sets. It is because the variables form both data sets get merged as one record based when there is a match in the value of the common variable.
DATA All_details;
MERGE SALARY(IN=a) DEPT(IN=b);
BY (empid);
IF a=1 and b=1;
RUN;
PROC PRINT DATA=All_details;
RUN;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.