Solved: Re: How to merge two datasets?

aaaaa34 · Posted 11-11-2019 08:07 PM

Hi,

How can I merge two datasets where in first dataset I have only one variable city. Second dataset contains two variables (without city variable)- one of them was used to choose right values of city variable:

find(city, variable1)>0

How can I merge these datasets? By which variable and by which kind of join (inner/left/right)?

That's my input data:

place	rate
london	15
paris	18
cracow	10

city

cool_london

notknown

paris

That's what I want to obtain:

city	rate
cool_london	15
paris	18

Reeza · Posted 11-12-2019 11:23 AM

Untested, you may need to tweak the FIND() function but it should work.

proc sql;
create table want as
select t1.place, t1.rate, t2.city
from input_data as t1
cross join city_terms as t2
where find(t1.place, t2.city, 'i')>0;
quit;

@aaaaa34 wrote:

Hi,

How can I merge two datasets where in first dataset I have only one variable city. Second dataset contains three variables (without city variable)- one of them was used to choose right values of city variable:
find(city, variable1)>0
How can I merge these datasets? By which variable and by which kind of join (inner/left/right)?

That's my input data:

place rate

london 15

paris 18

cracow 10

city

cool_london

notknown

paris

That's what I want to obtain:

city rate

cool_london 15

paris 18

View solution in original post

mkeintz · Posted 11-11-2019 08:37 PM

I do not comprehend your question. Please provide sample data of the two data sets that you have, and sample data of what you want to produce.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

aaaaa34 · Posted 11-11-2019 08:59 PM

I corrrect my post. Can you take a look?

sustagens · Posted 11-11-2019 09:35 PM

"Second dataset contains three variables (without city variable)" Both input datasets you posted have less than three variables. Please post the source tables as is.

aaaaa34 · Posted 11-12-2019 12:01 PM

I made a mistake, I edited my post.

mkeintz · Posted 11-11-2019 11:05 PM

If you know that you will never have more than one place found for any city, then this program would satisfy your objective.

data one;
  input place :$10. rate;
datalines;
london 15
paris 18
cracow 10
run;

data two;
  input city :$15. ;
datalines;
cool_london
notknown
paris
run;

data want (drop=rc);
  if _n_=1 then do;
    if 0 then set one;
    declare hash d1 (dataset:'one');
     d1.definekey('place');
     d1.definedata('place','rate');
     d1.definedone();
    declare hiter i ('d1');
  end;
  set two;
  rc=i.first(); 
  do while (find(city,trim(place))=0 and rc=0);
    rc=i.next();      
  end;
  if rc=0;
run;

After establishing a lookup table base on ONE in the hash object D1, it reads each record in dataset TWO. Then it iterates through D1 until a match is found, or the hash object is exhausted. If the hash object is exhausted the return code variable RC is not a zero, and therefore you don't want an output record. This can be relatively efficient because you tell the loop to stop with a successful FIND function (i.e. the FIND function returns a position number representing the character at which a successful match for PLACE has been found in CITY).

So if you don't have many instances where an observations in TWO is never found in ONE, the above will be more efficient than the far simpler program below. The below does a cartesian comparison of every obs in TWO against every obs in ONE, even after a match has been found. Of course, this will discover instances in which a given PLACE is found in more than one CITY:

proc sql noprint;
  create table want2 as 
  select * from one inner join two
    on two.city like cats('%',one.place,'%') ;
quit;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

JJP1 · Posted 11-12-2019 02:18 AM

Hi @aaaaa34 ,

Please have a look at below code and check whether this is what you are expecting.

data cityrate;
input place $6. rate;
datalines;
london	15
paris	18
cracow	10
;
run;
 
data city;
input place1 $11.;
datalines;
cool_london
notknown
paris
;
run;

 

data mergercity(keep=place1 rate) ;
merge cityrate(in=a) city(in=b)  ;


if a  and   b;
run;

aaaaa34 · Posted 11-12-2019 05:10 AM

There should be two variables

JJP1 · Posted 11-12-2019 05:21 AM

Hi @aaaaa34 ,

i ran the whole code and final merge dataset is showing 2 variables only-got output as below.please clearly suggest what you are expecting

rate place1
15 cool_london
18 notknown
10 paris

aaaaa34 · Posted 11-12-2019 12:03 PM

Thank you @JJP1 but I want to have only place1 with values of place (notknown doesn't contain them)

Reeza · Posted 11-12-2019 11:23 AM

Untested, you may need to tweak the FIND() function but it should work.

proc sql;
create table want as
select t1.place, t1.rate, t2.city
from input_data as t1
cross join city_terms as t2
where find(t1.place, t2.city, 'i')>0;
quit;

@aaaaa34 wrote:

Hi,

How can I merge two datasets where in first dataset I have only one variable city. Second dataset contains three variables (without city variable)- one of them was used to choose right values of city variable:
find(city, variable1)>0
How can I merge these datasets? By which variable and by which kind of join (inner/left/right)?

That's my input data:

place rate

london 15

paris 18

cracow 10

city

cool_london

notknown

paris

That's what I want to obtain:

city rate

cool_london 15

paris 18

aaaaa34 · Posted 11-12-2019 12:07 PM

Yes it worked. But is it possible to avoid find in proc sql (like using this function earlier)? And then merge city variable (right values) with rate? The problem is that I dont know how to merge this variables?

Reeza · Posted 11-12-2019 12:37 PM

No, you cannot avoid using FIND() because it's not an exact match so you need a condition on the join. What's the issue with FIND()?

SAS Innovate 2025: Call for Content

Classroom Training Available!