BookmarkSubscribeRSS Feed
cphd
Obsidian | Level 7

Hi there - 

I am trying to merge two dataset by id, but one data has unique id with multiple observations and the second data has unique id with other information that I need. I want to merge Data2 info to Data1 for all the observations. 

 

Data 1

id name score

1  Jo       90

1  Jo       86

1  Jo       54

2  Julia   87

2  Julia   89

 

Data 2

id  Age  Sex

1    3    M

2    5    F

 

Outcome file that I want to see

id name score  Age  Sex

1  Jo       90       3      M

1  Jo       86       3      M

1  Jo       54       3      M

2  Julia   87       5       F

2  Julia   89       5       F

 

I ran the code below but it seems that it merged with the first observation only.. What am I missing? Thank you. 

data outcome;
merge Data1 Data2;
by id;
run;
9 REPLIES 9
Reeza
Super User
What did the log show? I would expect it to work just as] posted so I suspect your data wasn't sorted and the merge truncated due to an error?

cphd
Obsidian | Level 7
Hi Reeza, thank you for your response.

It was successfully merged but there were some missing info in the outcome file, which made me confused. I realized that there are additional information that didn't match to each other. For example, I have id = 4 in data 2 but not in data 1.

So... my question now will be How can I get the outcome file with matched sample only?
Reeza
Super User
data want;
merge data1 (in=d1) data2 (in=d2);
by id;

if d1 & d2;

run;

You can use the IN option which creates an indicator variable to indicate a record source.Then if it's from both you keep only those. The join type is considered an 'inner join' in SQL.

 

https://blogs.sas.com/content/sastraining/2015/05/27/life-saver-tip-for-comparing-proc-sql-join-with...

cphd
Obsidian | Level 7
I followed your code but the log said

"WARNING:Apparent symbolic reference D2 not resolved. "

And it gave me the same result of the entire merged dataset.
Tom
Super User Tom
Super User

There is huge difference between <<D1 & D2>> which means <<D1 and D2>> and <<D1&D2>> which has a reference to a macro variable named D2.

Reeza
Super User
Then you didn't use the code exactly. Post your code and log?
I suspect Tom is right, you didn't have a space after the &.
Kurt_Bremser
Super User

When we supply code, we do so in a way that allows direct copy/pasting to your SAS environment (we post the code usein g the "little running man" button). You MUST do this first, to get the tested code (code supplied by senior posters here has been tested in their environment, unless noted otherwise) as-is; after that, you can play around with it and see what consequences each change that you introduce has.

ballardw
Super User

Double check your data2 set to make sure that there are no duplicates of ID. That may also lead to unexpected results.

Tom
Super User Tom
Super User

You don't actually show what you mean, but I suspect that the issue is that the NEW variables from the ONE dataset that you are trying to add to the MANY dataset already exist.  In that case SAS will overwrite the value read from the ONE dataset for the first observation in the group with the values read from the MANY dataset for the 2,3, etc variables in the group.  You could use the DROP= dataset option.

data outcome;
merge Data1(drop=age sex) Data2;
by id;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2326 views
  • 0 likes
  • 5 in conversation