Hi,
I was given 2 data files containing variables of similar participants (slightly different due to participants dropping out over time).
However, the same participants have different ID in each file, eg. ID 1784636 in one file is ID 1000015 in the other file, but they represent the same participant.
There are over 500,000 participants in my file so can't match them individually. I do have an Excel file showing which ID in file 1 matches with which ID in file 2.
Therefore, I was wondering is there any code that I could use to merge my 2 files using this information?
Thank you!
Best wishes
Gewei
Two generic approaches: Sort and merge or Join with Proc sql.
Proc sort data=one; by idvarname; run; proc sort data=two; by idvarname; run; data combined; merge one two ; by idvarname; run;
Will create a combined data with the variables in two that do not have matching names those in data one. Data two in this example I sort of think of as the Excel you mentioned after reading into SAS. The variable name for the value that matches in set one should be the same and the other variable name should be the same as in the other data set you want bring together. Make sure that the Excel when read has the same type and length as your existing ID variables. If you rely on Proc import you may get mixes of numeric and character that will not allow this to work.
Then the same kind of sort and merge to bring the other data together. WARNING: if you have repeat values of the variable you will match on in both sets a Data step Merge is not the correct tool.
Or sql;
Proc sql; create table combined as select a.*, b.secondidvarname from one as a left join two as b on a.idvariable=b.idvariable ; quit;
The variable names need not be the same in this approach but the types (character or numeric) really should be the same.
This would be the way to match data sets with multiple values of id variable in each but may need to be adjusted. The ON conditions can include more than one condition.
Thank you for your reply.
But how do I let SAS recognise ID in the existing file and add the corresponding ID from the Excel file as a new ID?
The IDs in the datasets and excel file are not in the same order.
Thank you.
Two generic approaches: Sort and merge or Join with Proc sql.
Proc sort data=one; by idvarname; run; proc sort data=two; by idvarname; run; data combined; merge one two ; by idvarname; run;
Will create a combined data with the variables in two that do not have matching names those in data one. Data two in this example I sort of think of as the Excel you mentioned after reading into SAS. The variable name for the value that matches in set one should be the same and the other variable name should be the same as in the other data set you want bring together. Make sure that the Excel when read has the same type and length as your existing ID variables. If you rely on Proc import you may get mixes of numeric and character that will not allow this to work.
Then the same kind of sort and merge to bring the other data together. WARNING: if you have repeat values of the variable you will match on in both sets a Data step Merge is not the correct tool.
Or sql;
Proc sql; create table combined as select a.*, b.secondidvarname from one as a left join two as b on a.idvariable=b.idvariable ; quit;
The variable names need not be the same in this approach but the types (character or numeric) really should be the same.
This would be the way to match data sets with multiple values of id variable in each but may need to be adjusted. The ON conditions can include more than one condition.
Thank you very much for your help.
I managed to merge my datasets.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.