Merge datasets with common variables and matching IDs, only overwrite NULLS

Accepted Solution Solved
Reply
Contributor kbk
Contributor
Posts: 29
Accepted Solution

Merge datasets with common variables and matching IDs, only overwrite NULLS

Hi All,

I am trying to merge/combine datasets that have the same structure (i.e., same ID and variables) but differ in their values. I want to keep all non-null values. When using a DATA MERGE approach, I end up overwriting non-null values. The PROC SQL approach works but is too cumbersome for the dataset size. There is likely a RENAME approach I could use but that will entail reading the variable names into a macro and looping; I'm hoping to find a more elegant solution. Any other suggestions?

Thanks,  kbk

data have1;

input RID $ var1 ;

datalines;                    

1 3

2 .

3 .

4 3

5 .

6 3

;

run;

proc sort; by RID; run;


data have2;

input RID $ var1 ;

datalines;                    

1 .

2 1

3 .

4 .

5 1

6 .

;

run;

proc sort; by RID; run;

*this overwrites nonMissing data;

data want;

     merge have1 have2;

     by RID;

run;

*this works but is inefficient for a 40000 row, 1000 variable dataset due to the subselect statement;

proc sql;

     create table want2 as

     select a.RID, case

                        when a.var1 = . then (select b.var1 from have2 b where a.rid = b.rid)

                        else a.var1

                        end as var1

     from have1 a, have2 b

     where a.rid = b.rid;

quit;


*this also works and is likely more efficient but a macro is still needed to apply to a lot of variables;

data want3 (drop=var11);

merge have1 (in=in1) have2(in=in2 rename=(var1=var11));

if in1 then do;

if  in2 then do;

if var1 = . then var1 = var11;

end;

end;

else if in2 then var1 = var11;

run;


Accepted Solutions
Solution
‎03-20-2014 10:44 AM
Super User
Posts: 5,504

Re: Merge datasets with common variables and matching IDs, only overwrite NULLS

If HAVE2 is limited to a single observation per RID, this would work:

data want;

   update have2 have1;

   by rid;

run;

If HAVE2 might contain multiple records per RID, a preliminary DATA step would be needed.  Let's cross that bridge only if necessary.

Good luck.

View solution in original post


All Replies
Super Contributor
Posts: 578

Re: Merge datasets with common variables and matching IDs, only overwrite NULLS

not sure why you would need a subselect..assuming have1 is the base table and you want to use data from have2 when have1.var1 is null...

proc sql;

create table want2 as

select

     t1.rid,

     coalesce(t1.var1,t2.var1) as var1

from

     have1 t1

     inner join have2 t2

          on t1.rid=t2.rid;

quit;

I often use Excel to create the sql if I have a large # of variables.  Copy the column names to excel and then use excel functions to create the sql statement and then copy it back to sas.

Solution
‎03-20-2014 10:44 AM
Super User
Posts: 5,504

Re: Merge datasets with common variables and matching IDs, only overwrite NULLS

If HAVE2 is limited to a single observation per RID, this would work:

data want;

   update have2 have1;

   by rid;

run;

If HAVE2 might contain multiple records per RID, a preliminary DATA step would be needed.  Let's cross that bridge only if necessary.

Good luck.

Contributor kbk
Contributor
Posts: 29

Re: Merge datasets with common variables and matching IDs, only overwrite NULLS

Posted in reply to Astounding

RID is the primary key. Your solution is exactly what I was looking for. I also implemented a macro around the DATA step in my OP. It's a mess to read but outputs the same dataset. Thanks!!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 351 views
  • 3 likes
  • 3 in conversation