BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aaaaa34
Calcite | Level 5

Hi,

 

How can I merge two datasets where in first dataset I have only one variable city. Second dataset contains two variables (without city variable)- one of them was used to choose right values of city variable:

find(city, variable1)>0

How can I merge these datasets? By which variable and by which kind of join (inner/left/right)?

 

That's my input data:

placerate
london15
paris18
cracow10

 

city
cool_london
notknown
paris

 

That's what I want to obtain:

cityrate
cool_london15
paris18
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Untested, you may need to tweak the FIND() function but it should work. 

 

proc sql;
create table want as
select t1.place, t1.rate, t2.city
from input_data as t1
cross join city_terms as t2
where find(t1.place, t2.city, 'i')>0;
quit;

@aaaaa34 wrote:

Hi,

 

How can I merge two datasets where in first dataset I have only one variable city. Second dataset contains three variables (without city variable)- one of them was used to choose right values of city variable:

find(city, variable1)>0

How can I merge these datasets? By which variable and by which kind of join (inner/left/right)?

 

That's my input data:

place rate
london 15
paris 18
cracow 10

 

city
cool_london
notknown
paris

 

That's what I want to obtain:

city rate
cool_london 15
paris 18

 

View solution in original post

12 REPLIES 12
mkeintz
PROC Star

I do not comprehend your question.  Please provide sample data of the two data sets that you have, and sample data of what you want to produce.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
aaaaa34
Calcite | Level 5
I corrrect my post. Can you take a look?
sustagens
Pyrite | Level 9
"Second dataset contains three variables (without city variable)" Both input datasets you posted have less than three variables. Please post the source tables as is.
aaaaa34
Calcite | Level 5
I made a mistake, I edited my post.
mkeintz
PROC Star

If you know that you will never have more than one place found for any city, then this program would satisfy your objective.

 

 

data one;
  input place :$10. rate;
datalines;
london 15
paris 18
cracow 10
run;

data two;
  input city :$15. ;
datalines;
cool_london
notknown
paris
run;

data want (drop=rc);
  if _n_=1 then do;
    if 0 then set one;
    declare hash d1 (dataset:'one');
     d1.definekey('place');
     d1.definedata('place','rate');
     d1.definedone();
    declare hiter i ('d1');
  end;
  set two;
  rc=i.first(); 
  do while (find(city,trim(place))=0 and rc=0);
    rc=i.next();      
  end;
  if rc=0;
run;

 

 

After establishing a lookup table base on ONE in the hash object D1, it reads each record in dataset TWO.  Then it iterates through D1 until a match is found, or the hash object is exhausted.  If the hash object is exhausted the return code variable RC is not a zero, and therefore you don't want an output record.  This can be relatively efficient because you tell the loop to stop with a successful FIND function (i.e. the FIND function returns a position number representing the character at which a successful match for PLACE has been found in CITY).

 

So if you don't have many instances where an observations in TWO is never found in ONE, the above will be more efficient than the far simpler program below.  The below does a cartesian comparison of every obs in TWO against every obs in ONE, even after a match has been found.  Of course, this will discover instances in which a given PLACE is found in more than one CITY:

 

proc sql noprint;
  create table want2 as 
  select * from one inner join two
    on two.city like cats('%',one.place,'%') ;
quit;

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
JJP1
Pyrite | Level 9

Hi @aaaaa34 ,

Please have a look at below code and check whether this is what you are expecting.

data cityrate;
input place $6. rate;
datalines;
london	15
paris	18
cracow	10
;
run;
 
data city;
input place1 $11.;
datalines;
cool_london
notknown
paris
;
run;

 

data mergercity(keep=place1 rate) ;
merge cityrate(in=a) city(in=b)  ;


if a  and   b;
run;
aaaaa34
Calcite | Level 5
There should be two variables
JJP1
Pyrite | Level 9

Hi @aaaaa34 ,

i ran the whole code and final merge dataset is showing 2 variables only-got output  as below.please clearly suggest what you are expecting

 

rate place1
15 cool_london
18 notknown
10 paris

aaaaa34
Calcite | Level 5
Thank you @JJP1 but I want to have only place1 with values of place (notknown doesn't contain them)
Reeza
Super User

Untested, you may need to tweak the FIND() function but it should work. 

 

proc sql;
create table want as
select t1.place, t1.rate, t2.city
from input_data as t1
cross join city_terms as t2
where find(t1.place, t2.city, 'i')>0;
quit;

@aaaaa34 wrote:

Hi,

 

How can I merge two datasets where in first dataset I have only one variable city. Second dataset contains three variables (without city variable)- one of them was used to choose right values of city variable:

find(city, variable1)>0

How can I merge these datasets? By which variable and by which kind of join (inner/left/right)?

 

That's my input data:

place rate
london 15
paris 18
cracow 10

 

city
cool_london
notknown
paris

 

That's what I want to obtain:

city rate
cool_london 15
paris 18

 

aaaaa34
Calcite | Level 5

Yes it worked. But is it possible to avoid find in proc sql (like using this function earlier)? And then merge city variable (right values) with rate? The problem is that I dont know how to merge this variables?

Reeza
Super User
No, you cannot avoid using FIND() because it's not an exact match so you need a condition on the join. What's the issue with FIND()?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 1581 views
  • 1 like
  • 5 in conversation