Hi, I'm hoping someone can help showing me how to merge 2 tables baed on different fields on a priority.. Here's what I mean. I havea tabel with a large dataset (Table1) and 2 fields Call them City1 and City2.. and another reference table that has City_A which could fall within either City1 or City2 or netiher in Table1..
Have (Table1):
City1 City2
Tor Tor
Mon Sch
Ott Bra
Tor Sca
Tor Ajx
Tor Sel
Van Win
ROC Tor
ROC Chr
ROC ROC
Reference Table:
City_A
Tor
Mon
Bra
Sel
Chr
Ott
Want a table where it takes the City2 priority if it's matched in Reference Table, OR if not, take City1 if it's matched, and if not leave ROC:
Looks like this:
City1 City2 City_A
Tor Tor Tor
Mon Sch Mon
Ott Bra Bra
Tor Sca Tor
Tor Ajx Tor
Tor Sel Sel
Van Win ROC
ROC Tor Tor
ROC Chr Chr
ROC ROC ROC
Thanks much....
One approach would create a format based on the reference table:
data ref;
set reference_table;
fmtname='$city';
label='found';
start=City_A;
run;
proc format cntlin=ref;
run;
Then a DATA step could use the format to look up matches:
data want;
set have;
if put(city2, $city.)='found' then City_A = city2;
else if put(city1, $city.)='found' then City_A=city1;
else city_A='ROC';
run;
Good luck.
Astounding's approach is simple and effective.
You could also do something like:
proc sql;
select distinct city_a into :refer separated by ' '
from reference;
quit ;
data want ;
set have ;
refer_list = "&refer";
if indexw ( refer_list , city2 ) > 0 then city_a=city2;
else if indexw ( refer_list , city1 ) > 0 then city_a=city1;
else city_a = "ROC";
run;
If your reference table doesn't have millions of records, in can all be done in a single sql query as follow
proc sql;
create table want as
select a.city1, a.city2, (case
when a.city2 in (select distinct city_a from reftable) then city2
when (a.city2 not in (select distinct city_a from reftable) and a.city1 in (select distinct city_a from reftable)) then city1
else "ROC"
end) as city_A
from have;
quit;
Above would be the code with the best readability but if the (select distinct city_a from reftable) returns hundred of thousands of records, then the "in" operator lags significantly behind other processing strategies.
Typically, when I want to do some funky merge between a large DS and a smaller one, as you've described above, I tend to use hash tables. Here's an alternative solution, again in a single step with significantly faster processing but using the less natural hash object syntax
data want(rename=(city_c=city_a));
if _N_=1 then do;
length city_a $40.; /* placeholder length, preferably use the same as the length of your city_a variable in your reference table */
declare hash h(datasets: 'reftable');
h.defineKey('city_a');
h.defineData(); /* only necessary if you ought to retain additional columns from reference table in the merge */
h.defineDone(); /* I have a blank this may or may not require the () for appropriate syntax */
end;
set have;
if h.find(key: city2)=0 then city_c=city2;
else if h.find(key:city1)=0 then city_c=city1;
else city_c="ROC";
run;
Basically, hash table is a lookup table. RC is the return code of the find method on the hash table. It is 0 if the key was found, some code NE 0 otherwise. Hash table is all in memory which means great gains of efficiency for medium-large datasets. However, it is capped the RAM allocation of your computer to SAS. With a single key variable and no data you should have no problem get a hash table of 1M records though.
Vince
Just as a variation to Vince's code:
data want;
if _N_=1 then
do;
if 0 then set reftable(keep=city_a);
declare hash h(dataset: 'reftable');
h.defineKey('city_a');
h.defineData('city_a');
h.defineDone();
end;
set have;
if h.find(key: city2) ne 0 then if h.find(key:city1) ne 0 then city_a="ROC";
run;
Thank you all.. Great help..
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.