Hi, I'm trying to merge 2 data sets but need a criteria that if one of the merges don't match (or it doesn't have a match) then merge another field.. this is what I mean :
Have:
SET1
AREA Prov
Tor Ont
Osh Ont
Mon Que
Van BC
Win Man
Ont Ont
Que Que
Man Man
BC BC
SET2
AREA VAL
Ont 2.0
Tor 3.5
Mon 4.4
Van 2.1
BC 3.2
Man 2.8
Que 1.9
Ont 2.5
Want: (merge SET1 and SET2)
SET3
AREA Prov VAL
Tor Ont 3.5
Osh Ont 2.5 <- here becasue there's no match for Osh in SET2 it uses the value for the PROV
Mon Que 4.4
Van BC 2.1
Win Man 2.8 <- here becasue there's no match for Win in SET2 it uses the value for the PROV
Ont Ont 2.5
Que Que 1.9
Man Man 2.8
BC BC 3.2
Thanks
Ontario has two records in set 2. Was that an error or is that the data you actually have? If it is the latter, what do you want to do in such situations?
You haven't responded to my questions, thus I'm answering based on the situation where the duplicate entry was an error. I think that the following accomplishes that which you want to do:
data SET1;
input (AREA Prov) ($);
cards;
Tor Ont
Osh Ont
Mon Que
Van BC
Win Man
Ont Ont
Que Que
Man Man
BC BC
;
data SET2;
input AREA $ VAL;
cards;
Tor 3.5
Mon 4.4
Van 2.1
BC 3.2
Man 2.8
Que 1.9
Ont 2.5
;
proc sort data=SET1;
by area;
run;
proc sort data=SET2;
by area;
run;
data found
notfound (drop=val
rename=(area=prov prov=area));
merge set1 (in=ina) set2 (in=inb);
by area;
if ina and inb then output found;
else if ina then output notfound;
run;
proc sort data=notfound;
by area;
run;
data need (rename=(area=prov prov=area));
merge notfound (in=ina) set2 (in=inb);
by area;
if ina and inb;
run;
data want;
set found need;
run;
Hi art.. is rename=(area=prov prov=area)); neccessary ?, since I have another field that I must match by (Date) as well. Thanks
If you use Tom's proc sql approach you avoid that need. There may be a way around it with a datastep approach, but I couldn't think of it. Using a datastep approach one has to do something to merge on the switched variables. My approach was simply to rename the variables.
As far as adding another variable, like date, you would have to again provide example data.
Sorry for the delay. One was an error....
Then either Tom's or my code should do what you want.
Perhaps we should let Art answer this one as those values look like Canadian provinces.
SQL is useful for doing complex joins.
create table SET3 as
select distinct a.area,a.prov,coalesce(b.val,c.val) as val
from set1 a
left join set2 b
on a.area = b.area
left join set2 c
on a.prov = c.area
;
Message was edited by: Tom Abernathy to fix typo in last ON condition
Tom,
They are definitely Canadian cities and provinces. Your proc sql code would need something more, though (probably a case statement), as it doesn't get the values for the two records where the city is missing from the original match.
I think my convoluted datastep approach provides the right result, but I'd be interested in seeing the proc sql equivalent that achieves the same result.
Art
I helps if I join on the right condition.
I see you got some nice suggestions. It seems to me hash() could be one of possible approaches, since it does not require sorting and is capable of searching multiple variables at the same time.
data SET1;
infile cards;
input AREA $ Prov $;
cards;
Tor Ont
Osh Ont
Mon Que
Van BC
Win Man
Ont Ont
Que Que
Man Man
BC BC
;
data SET2;
infile cards;
input AREA $ val;
cards;
Tor 3.5
Mon 4.4
Van 2.1
BC 3.2
Man 2.8
Que 1.9
Ont 2.5
;
data merged;
if _n_=1 then do;
if 0 then set set2;
declare hash merg (dataset: "set2");
merg.definekey("area");
merg.definedata("val");
merg.definedone();
end;
set set1;
if merg.find(key:area)=0 then output;
else if merg.find(key:prov)=0 then output;
else delete;
run;
proc print;run;
Kindly Regards,
Haikuo
Thanks 2m2l, what if I also have different dates that I need to match for each of the obs in Set1 and Set2..
eg.
SET1
AREA Prov Date
Tor Ont 2009.01
Tor Ont 2009.02
Tor Ont 2009.03
Osh Ont 2009.01
Osh Ont 2009.02
Osh Ont 2009.03
Mon Que 2009.01
Mon Que 2009.02
Mon Que 20009.03
SET2
AREA VAL Date
Ont 2.0 2009.01
Ont 2.1 2009.02
Ont 2.5 2009.03
Tor 3.5 2009.01
Tor 3.6 2009.02
Tor 3.5 2009.03
Mon 4.4 2009.01
Mon 4.2 2009.02
Mon 4.3 2009.03
Que 1.9 2009.01
Que 0.6 2009.02
Que 0.9 2009.03
Ont 2.5 2009.01
Ont 2.8 2009.02
Ont 3.4 2009.03
Thanks
Using Tom's proc sql method, which I happen to prefer of the three methods that have been offered, I think that the following accomplishes what you need:
proc sql;
create table SET3 as
select distinct a.area,a.prov,a.date,
coalesce(b.val,c.val) as val
from set1 a
left join set2 b
on a.area = b.area and a.date = b.date
left join set2 c
on a.prov = c.area and a.date = c.date
;
quit;
Here is Hash method. if you like it.
data SET1 ; input AREA $ Prov $ Date $ ; datalines; Tor Ont 2009.01 Tor Ont 2009.02 Tor Ont 2009.03 Osh Ont 2009.01 Osh Ont 2009.02 Osh Ont 2009.03 Mon Que 2009.01 Mon Que 2009.02 Mon Que 2009.03 ; run; data SET2; input AREA $ VAL Date $ ; datalines; Ont 2.0 2009.01 Ont 2.1 2009.02 Ont 2.5 2009.03 Tor 3.5 2009.01 Tor 3.6 2009.02 Tor 3.5 2009.03 Mon 4.4 2009.01 Mon 4.2 2009.02 Mon 4.3 2009.03 Que 1.9 2009.01 Que 0.6 2009.02 Que 0.9 2009.03 Ont 2.5 2009.01 Ont 2.8 2009.02 Ont 3.4 2009.03 ; run; data want(drop=rc rx); if _n_ eq 1 then do; if 0 then set set2; declare hash ha(dataset:'set2',hashexp:10); ha.definekey('area','date'); ha.definedata('val'); ha.definedone(); end; set set1; call missing(val); rc=ha.find(); if rc ne 0 then rx=ha.find(key:prov,key:date); run;
Ksharp
I can see that you have got a few replies already.
I've been quite happy using PROC SQL with a full outer join and the COALESCE() function on the common fields.
Suspect performance is less than ideal, but others will probably comment on that...
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.