Hi I have two data sets that i want to merge. The first is called Common and the second is Link and they both have a common key "Index". Common is sorted by index but there are multiple rows in Common with the same index whereas the each row in Link has a unique index. I only want the information in Link to merge onto the first unique index and blank for other rows in Common where the index is repeated. Example below:
Common
Index | Field2 | Field3 |
---|---|---|
1 | ||
1 | ||
1 |
Link
Index | Field3 | ||
---|---|---|---|
1 | 100 | ||
2 | 200 | ||
3 | 100 |
Merged File I Want
Index | Field1 | Field2 | Field3 |
---|---|---|---|
1 | 100 | ||
1 | 0 | ||
1 | 0 | ||
Yep, just pop a case around the assign:
proc sql;
create table want as
select A.*,
case when B.FIELD3=. then 0 else B.FIELD end as FIELD3
from COMMON A
left join (select 1 as FLAG,* from LINK) B
on A.INDEX=B.INDEX
and A.FLAG=B.FLAG;
quit
That's pretty odd. Anyways:
sort common / link; by index;
data want;
merge common link;
by index;
run;
data want;
set want;
by index;
if not first.index then field3=0;
run;
Or you could assign a flag:
data common;
by index;
if first.index then flag=1;
run;
then merge based on that:
proc sql;
create table want as
select A.*,
B.FIELD3
from COMMON A
left join (select 1 as FLAG,* from LINK) B
on A.INDEX=B.INDEX
and A.FLAG=B.FLAG;
quit
Thanks that works perfectly. There are about 6 numeric felds on the link file which merges on. Is there a way that you can make the cells 0 when the flag is not 1. At the moment, I have values where the flag is 1 and blank cells where the flag is 0 for all the fields merging on from link.
Many thanks
Yep, just pop a case around the assign:
proc sql;
create table want as
select A.*,
case when B.FIELD3=. then 0 else B.FIELD end as FIELD3
from COMMON A
left join (select 1 as FLAG,* from LINK) B
on A.INDEX=B.INDEX
and A.FLAG=B.FLAG;
quit
Thanks but if I want this to apply to all fields merging on (not just field3) do I use
case when B.*=. then 0 else B.*...
Well, your probably better off using the datastep method if you want to find shortcuts. With SQL you should really know what you want out at the end, where from etc. rather than using shortcuts or wildcards. The problem with the above is that you are not operating on a group, just an ad-hoc selection - for example if the order of common changes then it wouldn't be the same value there. Makes it quite difficult logically to link the two.
So yes, you would need one case for each variable, the principal being that you should be specifying what should come out anyways in the select:
proc sql;
create table WANT as
select A.VAR1,
A.VAR2,
A.VAR3,
case when B.VAR1 is null then 0 else B.VAR1 end as VAR1,
case when B.VAR2 is null then 0 else B.VAR2 end as VAR2,
...
There are plusses and minuses for datastep syntax vs SQL. You could also generate the required code using call execute.
data common;
infile datalines missover;
input index field1 field2 field3;
datalines;
1 . . .
1 . . .
1 . . .
2 . . .
2 . . .
3 . . .
;
data link;
input index field3;
datalines;
1 100
2 200
3 100
;
data want;
update common link;
by index;
if field3=. then field3=0;
run;
So basically you want to defeat the normal behavior of SAS where the values from the datasets with fewer observations are effectively retained.
One simple way is to OUTPUT the record and then reset the values to missing (or zero if you want). Then this will be the value that is retained onto all of the extra rows. When you read a new record from the table that contributed that field (or fields) then the zero will be overwritten with the real value.
data want ;
merge common link ;
by index ;
output;
field3=0;
run;
IMO, Tom's solution is easy and expandable to more variables. However, note that your original post shows that FIELD3 is also part of COMMON. If that is the case, you will need to get rid of it. In its shortest form, the DATA step could be:
data want;
field3=0;
merge common (drop=field3) link;
by index;
run;
There are further complications if any of the additional fields in LINK are actually character instead of numeric.
Finally, the result you are seeking is unusual. If you post some idea of why you need it, there may be other ways to get you where you need to go.
Good luck.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.