Re: Merge two tables with douplicated data

sebastianmendez · Posted 10-13-2022 06:26 AM

Hello,

I'm a novice at SAS and i would ask you how I can do a merge in these conditions:

I have Table1(IDs, xvariable), and i have a smaller one, Table2(IDs, xvariable). Both tables share some IDs, but i want to keep those in Table2 (and not those in Table1, for those that are in both tables). Sometimes the data (xvariable) is different for the same ID, but data in Table2 is more truthful.

Thank you!

PeterClemmensen · Posted 10-13-2022 06:33 AM

Can you post some example of what your data looks like and what you expect as result?

Makes it easier to provide a usable code answer.

The DATA to DATA Step Macro
Blog: SASnrd

sebastianmendez · Posted 10-13-2022 07:31 AM

Something like this

LinusH · Posted 10-13-2022 08:28 AM

To me it looks like a LEFT JOIN.

For variableX, use the COALESCE function.

Data never sleeps

Tom · Posted 10-13-2022 09:14 AM

That is a simple data step MERGE.

data want;
  merge table1 table2;
  by id;
run;

PeterClemmensen · Posted 10-13-2022 09:24 AM

data one;
input ID X $;
datalines;
1 a 
2 b 
3 a 
4 b 
5 b 
6 c 
;

data two;
input ID X $;
datalines;
1 c 
8 b 
9 b 
;

data want;
   merge one two;
   by ID;
run;

The DATA to DATA Step Macro
Blog: SASnrd

ballardw · Posted 10-13-2022 10:43 AM

@sebastianmendez wrote:

Something like this

If any of your data steps have multiple observations with the same value of ID you should provide an example with some of the repeated values.

If BOTH data sets have multiple observations with the same value of ID then it is critical that you show example data and the desired result.

sebastianmendez · Posted 10-13-2022 10:53 AM

Well, i thought i did that.
You can see the ID 1 is present in both tables. And the desired result is keep the observation (ID1) from the table 2.

Tom · Posted 10-13-2022 11:21 AM

@sebastianmendez wrote:
Well, i thought i did that.
You can see the ID 1 is present in both tables. And the desired result is keep the observation (ID1) from the table 2.

The fact that there are ID values that can appear in both datasets was already covered by the data in the photographs you posted. But that was not the question that was asked.

The question was whether or not the ID value uniquely identifies an observation (in either dataset individually). Your data did not show any cases where the same ID value appeared more than once in the same dataset. So the question is whether that is possible with your data or not. If the same ID can appear multiple times in one (or both) of the dataset will require a more complicated solution. And also more explanation from you about what output you would want in that situation.

ballardw · Posted 10-13-2022 11:32 AM

@sebastianmendez wrote:
Well, i thought i did that.
You can see the ID 1 is present in both tables. And the desired result is keep the observation (ID1) from the table 2.

I should have provided an example of what I meant. Sorry, but I couldn't edit your pictures.

Stealing from @PeterClemmensen , suppose that your data actually looked like

(the simpler case of only one data set with duplicate id values:

Case 1: data set One is the only one with duplicates of ID

data one;
input ID X $;
datalines;
1 a 
1 z
2 b 
3 a 
4 b 
5 b 
6 c 
;

data two;
input ID X $;
datalines;
1 c 
8 b 
9 b 
;

Case 2: where data set 2 has duplicates of Id

data one;
input ID X $;
datalines;
1 a 
2 b 
3 a 
4 b 
5 b 
6 c 
;

data two;
input ID X $;
datalines;
1 c 
1 w
8 b 
9 b 
;

Case 3: both sets have duplicates of Id

data one;
input ID X $;
datalines;
1 a 
1 h
1 m
1 s
2 b 
3 a 
4 b 
5 b 
6 c 
;

data two;
input ID X $;
datalines;
1 c 
1 z
8 b 
9 b 
;

If you have any of these cases you need to consider what you expect for a result and show that. The last one can be very complicated depending on exact expectation.

Registration is open

SAS Training: Just a Click Away