BookmarkSubscribeRSS Feed
podarum
Quartz | Level 8

Hi, I'm trying to merge 2 data sets but need a criteria that if one of the merges don't match (or it doesn't have a match) then merge another field.. this is what I mean :

Have:

SET1

AREA    Prov   

Tor         Ont      

Osh       Ont      

Mon       Que     

Van        BC      

Win        Man    

Ont        Ont      

Que       Que     

Man      Man     

BC         BC      

SET2

AREA    VAL

Ont         2.0

Tor          3.5

Mon        4.4

Van        2.1

BC         3.2

Man        2.8

Que        1.9

Ont         2.5

Want: (merge SET1 and SET2)

SET3

AREA    Prov     VAL

Tor         Ont       3.5

Osh        Ont       2.5  <- here becasue there's no match for Osh in SET2 it uses the value for the PROV

Mon       Que       4.4

Van         BC       2.1

Win         Man     2.8  <- here becasue there's no match for Win in SET2 it uses the value for the PROV

Ont         Ont       2.5

Que        Que       1.9

Man        Man      2.8

BC          BC        3.2

Thanks

14 REPLIES 14
art297
Opal | Level 21

Ontario has two records in set 2.  Was that an error or is that the data you actually have?  If it is the latter, what do you want to do in such situations?

art297
Opal | Level 21

You haven't responded to my questions, thus I'm answering based on the situation where the duplicate entry was an error.  I think that the following accomplishes that which you want to do:

data SET1;

  input (AREA Prov) ($);

  cards;

Tor       Ont     

Osh       Ont     

Mon       Que    

Van       BC     

Win       Man   

Ont       Ont     

Que       Que    

Man       Man    

BC        BC     

;

data SET2;

  input AREA $ VAL;

  cards;

Tor        3.5

Mon        4.4

Van        2.1

BC         3.2

Man        2.8

Que        1.9

Ont        2.5

;

proc sort data=SET1;

  by area;

run;

proc sort data=SET2;

  by area;

run;

data found

     notfound (drop=val

               rename=(area=prov prov=area));

  merge set1 (in=ina) set2 (in=inb);

  by area;

  if ina and inb then output found;

  else if ina then output notfound;

run;

proc sort data=notfound;

  by area;

run;

data need (rename=(area=prov prov=area));

  merge notfound (in=ina) set2 (in=inb);

  by area;

  if ina and inb;

run;

data want;

  set found need;

run;

podarum
Quartz | Level 8

Hi art.. is rename=(area=prov prov=area)); neccessary ?, since I have another field that I must match by (Date) as well.  Thanks

art297
Opal | Level 21

If you use Tom's proc sql approach you avoid that need.  There may be a way around it with a datastep approach, but I couldn't think of it.  Using a datastep approach one has to do something to merge on the switched variables.  My approach was simply to rename the variables.

As far as adding another variable, like date, you would have to again provide example data.

podarum
Quartz | Level 8

Sorry for the delay.  One was an error....

art297
Opal | Level 21

Then either Tom's or my code should do what you want.

Tom
Super User Tom
Super User

Perhaps we should let Art answer this one as those values look like Canadian provinces. Smiley Wink

SQL is useful for doing complex joins.

create table SET3 as

  select distinct a.area,a.prov,coalesce(b.val,c.val) as val

  from set1 a

   left join set2 b

   on a.area = b.area

   left join set2 c

   on a.prov = c.area

;

Message was edited by: Tom Abernathy to fix typo in last ON condition

art297
Opal | Level 21

Tom,

They are definitely Canadian cities and provinces.  Your proc sql code would need something more, though (probably a case statement), as it doesn't get the values for the two records where the city is missing from the original match.

I think my convoluted datastep approach provides the right result, but I'd be interested in seeing the proc sql equivalent that achieves the same result.

Art

Tom
Super User Tom
Super User

I helps if I join on the right condition.

2much2learn
Calcite | Level 5

I see you got some nice suggestions. It seems to me hash() could be one of possible approaches, since it does not require sorting and is capable of searching multiple variables at the same time.

data SET1;

infile cards;

input AREA $  Prov $;

cards;

Tor         Ont

Osh       Ont

Mon       Que

Van        BC

Win        Man

Ont        Ont

Que       Que

Man      Man

BC         BC

;

data SET2;

infile cards;

input AREA $  val;

cards;

Tor          3.5

Mon        4.4

Van        2.1

BC         3.2

Man        2.8

Que        1.9

Ont         2.5

;

data merged;

   if _n_=1 then do;

     if 0 then set set2;

declare hash merg (dataset: "set2");

merg.definekey("area");

merg.definedata("val");

merg.definedone();

   end;

set set1;

if merg.find(key:area)=0 then output;

else if merg.find(key:prov)=0 then output;

else delete;

run;

proc print;run;

Kindly Regards,

Haikuo

podarum
Quartz | Level 8

Thanks 2m2l, what if I also have different dates that I need to match for each of the obs in Set1 and Set2..

eg.

SET1

AREA    Prov   Date   

Tor         Ont      2009.01

Tor         Ont      2009.02

Tor         Ont      2009.03

Osh       Ont       2009.01

Osh       Ont       2009.02

Osh       Ont       2009.03

Mon       Que      2009.01

Mon       Que      2009.02

Mon       Que      20009.03

 

SET2

AREA    VAL     Date

Ont         2.0      2009.01

Ont         2.1      2009.02

Ont         2.5      2009.03

Tor          3.5     2009.01

Tor          3.6      2009.02

Tor          3.5      2009.03

Mon        4.4      2009.01

Mon        4.2      2009.02

Mon        4.3      2009.03

Que        1.9      2009.01

Que        0.6      2009.02

Que        0.9      2009.03

Ont         2.5      2009.01

Ont         2.8      2009.02

Ont         3.4      2009.03

Thanks

art297
Opal | Level 21

Using Tom's proc sql method, which I happen to prefer of the three methods that have been offered, I think that the following accomplishes what you need:

proc sql;

  create table SET3 as

    select distinct a.area,a.prov,a.date,

     coalesce(b.val,c.val) as val

      from set1 a

        left join set2 b

          on a.area = b.area and a.date = b.date

            left join set2 c

              on a.prov = c.area and a.date = c.date

;

quit;

Ksharp
Super User

Here is Hash method. if you like it.

data SET1 ;
input AREA  $  Prov  $ Date $ ;
datalines;
Tor         Ont      2009.01
Tor         Ont      2009.02
Tor         Ont      2009.03
Osh       Ont       2009.01
Osh       Ont       2009.02
Osh       Ont       2009.03
Mon       Que      2009.01
Mon       Que      2009.02
Mon       Que      2009.03
;
run;

data SET2;
input AREA $   VAL     Date $ ;
datalines;
Ont         2.0      2009.01
Ont         2.1      2009.02
Ont         2.5      2009.03
Tor          3.5     2009.01
Tor          3.6      2009.02
Tor          3.5      2009.03
Mon        4.4      2009.01
Mon        4.2      2009.02
Mon        4.3      2009.03
Que        1.9      2009.01
Que        0.6      2009.02
Que        0.9      2009.03
Ont         2.5      2009.01
Ont         2.8      2009.02
Ont         3.4      2009.03
;
run;
data want(drop=rc rx); 
 if _n_ eq 1 then do;
  if 0 then set set2;
  declare hash ha(dataset:'set2',hashexp:10);
   ha.definekey('area','date');
   ha.definedata('val');
   ha.definedone();
 end;
set set1;
call missing(val);               
rc=ha.find();
if rc ne 0 then rx=ha.find(key:prov,key:date);
run;


Ksharp

f_rederik
Obsidian | Level 7

I can see that you have got a few replies already.

I've been quite happy using PROC SQL with a full outer join and the COALESCE() function on the common fields.

Suspect performance is less than ideal, but others will probably comment on that...

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1502 views
  • 0 likes
  • 6 in conversation