Solved: Re: Merge

brophymj · Posted 08-18-2014 05:48 AM

Hi I have two data sets that i want to merge. The first is called Common and the second is Link and they both have a common key "Index". Common is sorted by index but there are multiple rows in Common with the same index whereas the each row in Link has a unique index. I only want the information in Link to merge onto the first unique index and blank for other rows in Common where the index is repeated. Example below:

Common

Index	Field2	Field3
1
1
1

Link

Index	Field3
1	100
2	200
3	100

Merged File I Want

Index	Field1	Field2	Field3
1			100
1			0
1			0

RW9 · Posted 08-18-2014 06:47 AM

Yep, just pop a case around the assign:

proc sql;

create table want as

select A.*,

case when B.FIELD3=. then 0 else B.FIELD end as FIELD3

from COMMON A

left join (select 1 as FLAG,* from LINK) B

on A.INDEX=B.INDEX

and A.FLAG=B.FLAG;

quit

View solution in original post

RW9 · Posted 08-18-2014 06:01 AM

That's pretty odd. Anyways:

sort common / link; by index;

data want;

merge common link;

by index;

run;

data want;

set want;

by index;

if not first.index then field3=0;

run;

Or you could assign a flag:

data common;

by index;

if first.index then flag=1;

run;

then merge based on that:

proc sql;

create table want as

select A.*,

B.FIELD3

from COMMON A

left join (select 1 as FLAG,* from LINK) B

on A.INDEX=B.INDEX

and A.FLAG=B.FLAG;

quit

brophymj · Posted 08-18-2014 06:34 AM

Thanks that works perfectly. There are about 6 numeric felds on the link file which merges on. Is there a way that you can make the cells 0 when the flag is not 1. At the moment, I have values where the flag is 1 and blank cells where the flag is 0 for all the fields merging on from link.

Many thanks

RW9 · Posted 08-18-2014 06:47 AM

Yep, just pop a case around the assign:

proc sql;

create table want as

select A.*,

case when B.FIELD3=. then 0 else B.FIELD end as FIELD3

from COMMON A

left join (select 1 as FLAG,* from LINK) B

on A.INDEX=B.INDEX

and A.FLAG=B.FLAG;

quit

brophymj · Posted 08-18-2014 06:53 AM

Thanks but if I want this to apply to all fields merging on (not just field3) do I use

case when B.*=. then 0 else B.*...

RW9 · Posted 08-18-2014 08:08 AM

Well, your probably better off using the datastep method if you want to find shortcuts. With SQL you should really know what you want out at the end, where from etc. rather than using shortcuts or wildcards. The problem with the above is that you are not operating on a group, just an ad-hoc selection - for example if the order of common changes then it wouldn't be the same value there. Makes it quite difficult logically to link the two.

So yes, you would need one case for each variable, the principal being that you should be specifying what should come out anyways in the select:

proc sql;

create table WANT as

select A.VAR1,

A.VAR2,

A.VAR3,

case when B.VAR1 is null then 0 else B.VAR1 end as VAR1,

case when B.VAR2 is null then 0 else B.VAR2 end as VAR2,

...

There are plusses and minuses for datastep syntax vs SQL. You could also generate the required code using call execute.

stat_sas · Posted 08-18-2014 06:30 AM

data common;
infile datalines missover;
input index field1 field2 field3;
datalines;
1 . . .
1 . . .
1 . . .
2 . . .
2 . . .
3 . . .
;

data link;
input index field3;
datalines;
1 100
2 200
3 100
;

data want;
update common link;
by index;
if field3=. then field3=0;
run;

Tom · Posted 08-18-2014 08:28 AM

So basically you want to defeat the normal behavior of SAS where the values from the datasets with fewer observations are effectively retained.

One simple way is to OUTPUT the record and then reset the values to missing (or zero if you want). Then this will be the value that is retained onto all of the extra rows. When you read a new record from the table that contributed that field (or fields) then the zero will be overwritten with the real value.

data want ;

merge common link ;

by index ;

output;

field3=0;

run;

Astounding · Posted 08-18-2014 08:50 AM

IMO, Tom's solution is easy and expandable to more variables. However, note that your original post shows that FIELD3 is also part of COMMON. If that is the case, you will need to get rid of it. In its shortest form, the DATA step could be:

data want;

field3=0;

merge common (drop=field3) link;

by index;

run;

There are further complications if any of the additional fields in LINK are actually character instead of numeric.

Finally, the result you are seeking is unusual. If you post some idea of why you need it, there may be other ways to get you where you need to go.

Good luck.

SAS Training: Just a Click Away