BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
brophymj
Quartz | Level 8

Hi I have two data sets that i want to merge. The first is called Common and the second is Link and they both have a common key "Index". Common is sorted by index but there are multiple rows in Common with the same index whereas the each row in Link has a unique index. I only want the information in Link to merge onto the first unique index and blank for other rows in Common where the index is repeated. Example below:

Common

IndexField2Field3
1
1
1

Link

IndexField3
1

100

2200
3100

Merged File I Want

IndexField1Field2Field3
1100
10
10
1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yep, just pop a case around the assign:

proc sql;

     create table want as

     select     A.*,

                   case     when B.FIELD3=. then 0 else B.FIELD end as FIELD3

     from      COMMON A

     left join  (select 1 as FLAG,* from LINK) B

     on          A.INDEX=B.INDEX

     and        A.FLAG=B.FLAG;

quit

View solution in original post

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

That's pretty odd.  Anyways:

sort common / link; by index;

data want;

     merge common link;

     by index;

run;

data want;

     set want;

     by index;

     if not first.index then field3=0;

run;

Or you could assign a flag:

data common;

     by index;

     if first.index then flag=1;

run;

then merge based on that:

proc sql;

     create table want as

     select     A.*,

                   B.FIELD3

     from      COMMON A

     left join  (select 1 as FLAG,* from LINK) B

     on          A.INDEX=B.INDEX

     and        A.FLAG=B.FLAG;

quit

brophymj
Quartz | Level 8

Thanks that works perfectly. There are about 6 numeric felds on the link file which merges on. Is there a way that you can make the cells 0 when the flag is not 1. At the moment, I have values where the flag is 1 and blank cells where the flag is 0 for all the fields merging on from link.

Many thanks

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yep, just pop a case around the assign:

proc sql;

     create table want as

     select     A.*,

                   case     when B.FIELD3=. then 0 else B.FIELD end as FIELD3

     from      COMMON A

     left join  (select 1 as FLAG,* from LINK) B

     on          A.INDEX=B.INDEX

     and        A.FLAG=B.FLAG;

quit

brophymj
Quartz | Level 8

Thanks but if I want this to apply to all fields merging on (not just field3) do I use

case when B.*=. then 0 else B.*...

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, your probably better off using the datastep method if you want to find shortcuts.  With SQL you should really know what you want out at the end, where from etc. rather than using shortcuts or wildcards.  The problem with the above is that you are not operating on a group, just an ad-hoc selection - for example if the order of common changes then it wouldn't be the same value there.  Makes it quite difficult logically to link the two.  

So yes, you would need one case for each variable, the principal being that you should be specifying what should come out anyways in the select:

proc sql;

     create table WANT as

     select     A.VAR1,

                   A.VAR2,

                   A.VAR3,

                   case when B.VAR1 is null then 0 else B.VAR1 end as VAR1,

                   case when B.VAR2 is null then 0 else B.VAR2 end as VAR2,

...

There are plusses and minuses for datastep syntax vs SQL.  You could also generate the required code using call execute.

stat_sas
Ammonite | Level 13

data common;
infile datalines missover;
input index field1 field2 field3;
datalines;
1 . . .
1 . . .
1 . . .
2 . . .
2 . . .
3 . . .
;

data link;
input index field3;
datalines;
1 100
2 200
3 100
;


data want;
update common link;
by index;
if field3=. then field3=0;
run;

Tom
Super User Tom
Super User

So basically you want to defeat the normal behavior of SAS where the values from the datasets with fewer observations are effectively retained.

One simple way is to OUTPUT the record and then reset the values to missing (or zero if you want).  Then this will be the value that is retained onto all of the extra rows.  When you read a new record from the table that contributed that field (or fields) then the zero will be overwritten with the real value.

data want ;

  merge common link ;

  by index ;

  output;

  field3=0;

run;

Astounding
PROC Star

IMO, Tom's solution is easy and expandable to more variables.  However, note that your original post shows that FIELD3 is also part of COMMON.  If that is the case, you will need to get rid of it.  In its shortest form, the DATA step could be:

data want;

  field3=0;

  merge common (drop=field3) link;

  by index;

run;

There are further complications if any of the additional fields in LINK are actually character instead of numeric.

Finally, the result you are seeking is unusual.  If you post some idea of why you need it, there may be other ways to get you where you need to go.

Good luck.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1618 views
  • 0 likes
  • 5 in conversation