BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Gewei
Calcite | Level 5

Hi,

 

I was given 2 data files containing variables of similar participants (slightly different due to participants dropping out over time).

However, the same participants have different ID in each file, eg. ID 1784636 in one file is ID 1000015 in the other file, but they represent the same participant.

There are over 500,000 participants in my file so can't match them individually. I do have an Excel file showing which ID in file 1 matches with which ID in file 2.

Therefore, I was wondering is there any code that I could use to merge my 2 files using this information?

Thank you!

 

Best wishes

Gewei

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Two generic approaches: Sort and merge or Join with Proc sql.

 

Proc sort data=one;
   by idvarname;
run;
proc sort data=two;
   by idvarname;
run;

data combined;
   merge one
         two
   ;
   by idvarname;
run;

Will create a combined data with the variables in two that do not have matching names those in data one. Data two in this example I sort of think of as the Excel you mentioned after reading into SAS. The variable name for the value that matches in set one should be the same and the other variable name should be the same as in the other data set you want bring together. Make sure that the Excel when read has the same type and length as your existing ID variables. If you rely on Proc import you may get mixes of numeric and character that will not allow this to work.

 

Then the same kind of sort and merge to bring the other data together. WARNING: if you have repeat values of the variable you will match on in both sets a Data step Merge is not the correct tool.

 

Or sql;

Proc sql;
   create table combined as 
   select a.*, b.secondidvarname
   from one as a
        left join
        two as b
        on a.idvariable=b.idvariable
   ;
quit;

The variable names need not be the same in this approach but the types (character or numeric) really should be the same.

This would be the way to match data sets with multiple values of id variable in each but may need to be adjusted. The ON conditions can include more than one condition.

 

View solution in original post

4 REPLIES 4
ballardw
Super User
Read the Excel file into a SAS data set. Use that to join the "other ID" to one of your existing datasets, merge or join on that now common id.
Gewei
Calcite | Level 5

Thank you for your reply.

But how do I let SAS recognise ID in the existing file and add the corresponding ID from the Excel file as a new ID?

The IDs in the datasets and excel file are not in the same order.

Thank you.

ballardw
Super User

Two generic approaches: Sort and merge or Join with Proc sql.

 

Proc sort data=one;
   by idvarname;
run;
proc sort data=two;
   by idvarname;
run;

data combined;
   merge one
         two
   ;
   by idvarname;
run;

Will create a combined data with the variables in two that do not have matching names those in data one. Data two in this example I sort of think of as the Excel you mentioned after reading into SAS. The variable name for the value that matches in set one should be the same and the other variable name should be the same as in the other data set you want bring together. Make sure that the Excel when read has the same type and length as your existing ID variables. If you rely on Proc import you may get mixes of numeric and character that will not allow this to work.

 

Then the same kind of sort and merge to bring the other data together. WARNING: if you have repeat values of the variable you will match on in both sets a Data step Merge is not the correct tool.

 

Or sql;

Proc sql;
   create table combined as 
   select a.*, b.secondidvarname
   from one as a
        left join
        two as b
        on a.idvariable=b.idvariable
   ;
quit;

The variable names need not be the same in this approach but the types (character or numeric) really should be the same.

This would be the way to match data sets with multiple values of id variable in each but may need to be adjusted. The ON conditions can include more than one condition.

 

Gewei
Calcite | Level 5

Thank you very much for your help.

I managed to merge my datasets.

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2320 views
  • 0 likes
  • 2 in conversation