Merging Table A and Table B by ID: replacing blank columns in Table A ...

mjalvarez · Posted 05-19-2022 04:09 PM

Hi Experts!

I need some help merging two tables.

Table A is a large dataset with 400+ columns and over 31,000 rows. Table B is smaller and shares some of the same columns (9) and rows (1600+) as Table A.

Both tables share a unique ID (unique_id) - I need to impute values from table B in to the corresponding missing field in table A by unique_id.

Here’s an example of what I want to do:

Table A

Unique_ID Var1 Var2 Var3 Var4 Var5 Var6……. Var400

MA_345 3 1 19 37.9 60 77

JM_909 . . . 40.2 55 67

TV_647 1 . . 37.7 62 83

ED_331 7 5 . 38.0 65 88

Table 2

Unique_ID Var1 Var2 Var3 Var4 Var5 Var6

JM_909 3 1 10 40.2 55 67

TV_647 1 0 15 37.7 62 83

Result

Unique_ID Var1 Var2 Var3 Var4 Var5 Var6……. Var400

MA_345 3 1 19 37.9 60 77

JM_909 3 1 10 40.2 55 67

TV_647 1 0 15 37.7 62 83

ED_331 7 5 . 38.0 65 88

Any help or guidance on this?

Thank you!

svh · Posted 05-23-2022 03:17 PM

There is a simple way to do this as long as the non-missing values for a particular UniqueID in Table 2 ALWAYS match the non-missing values in Table 1 (and if each table only has one row per Unique_ID). This appears to be the case from the data you have shared. E.g., subject JM_909 has the same non-missing values for Var4, Var5, and Var6 in both tables.
You might remember that a MERGE in a data set the values in the right-hand table overwrite the values in the left-hand table for the matched observations that are defined with the BY statement. You would first need to sort each data set by Unique_ID and then

Data Want;
merge Table1 Table2; /*For each value of UNIQUE_ID where the same variable exists in both tables, the value from Table2 will overwrite the value in Table1.*/
by Unique_ID;
run;

Tom · Posted 05-23-2022 04:29 PM

You might want to do an UPDATE. But you seem to want the reverse of the normal case it was designed to support. The UPDATE statement is designed to apply transactions to an existing dataset. Any missing value in the transaction dataset is ignored so the existing value is unchanged. So the non-missing values in the transaction dataset "win". You appear to want the reverse, where the non missing values in the original dataset "win". So just treat the transactions as the original dataset and original dataset as the transactions.

data want;
  update table2 tableA;
  by unique_id;
run;

Merging Table A and Table B by ID: replacing blank columns in Table A with column value from Table B

Re: Merging Table A and Table B by ID: replacing blank columns in Table A with column value from Tab

Re: Merging Table A and Table B by ID: replacing blank columns in Table A with column value from Tab

Merging Table A and Table B by ID: replacing blank columns in Table A with column value from Table B

Re: Merging Table A and Table B by ID: replacing blank columns in Table A with column value from Tab

Re: Merging Table A and Table B by ID: replacing blank columns in Table A with column value from Tab

Click image to register for webinar

Classroom Training Available!