BookmarkSubscribeRSS Feed
podarum
Quartz | Level 8

Hi, I'm hoping someone can help showing me how to merge 2 tables baed on different fields on a priority.. Here's what I mean.  I havea tabel with a large dataset (Table1) and 2 fields  Call them City1 and City2.. and another reference table that has City_A which could fall within either City1 or City2 or netiher in Table1..

Have (Table1):

City1      City2
Tor          Tor

Mon        Sch

Ott          Bra

Tor          Sca

Tor          Ajx

Tor          Sel

Van         Win

ROC       Tor

ROC       Chr

ROC       ROC

Reference Table:

City_A

Tor

Mon

Bra

Sel

Chr

Ott

Want a table where it takes the City2 priority if it's matched in Reference Table, OR if not, take City1 if it's matched, and if not leave ROC:

Looks like this:

City1      City2       City_A

Tor          Tor           Tor

Mon        Sch          Mon

Ott          Bra           Bra

Tor          Sca          Tor

Tor          Ajx           Tor

Tor          Sel           Sel

Van         Win         ROC

ROC       Tor           Tor

ROC       Chr           Chr

ROC       ROC         ROC

Thanks much....

5 REPLIES 5
Astounding
PROC Star

One approach would create a format based on the reference table:

data ref;

  set reference_table;

  fmtname='$city';

  label='found';

  start=City_A;

run;

proc format cntlin=ref;

run;

Then a DATA step could use the format to look up matches:

data want;

set have;

if put(city2, $city.)='found' then City_A = city2;

else if put(city1, $city.)='found' then City_A=city1;

else city_A='ROC';

run;

Good luck.

Fugue
Quartz | Level 8

Astounding's approach is simple and effective.

You could also do something like:


proc sql;
select distinct city_a into :refer separated by ' '
from reference;
quit ;

data want ;
set have ;
refer_list = "&refer";
if indexw ( refer_list , city2 ) > 0 then city_a=city2;
else if indexw ( refer_list , city1 ) > 0 then city_a=city1;
else city_a = "ROC";
run;

Vince28_Statcan
Quartz | Level 8


If your reference table doesn't have millions of records, in can all be done in a single sql query as follow

proc sql;

     create table want as

     select a.city1, a.city2, (case

                                        when a.city2 in (select distinct city_a from reftable) then city2

                                        when (a.city2 not in (select distinct city_a from reftable) and a.city1 in (select distinct city_a from reftable)) then city1

                                        else "ROC"

                                        end) as city_A

     from have;

quit;

Above would be the code with the best readability but if the (select distinct city_a from reftable) returns hundred of thousands of records, then the "in" operator lags significantly behind other processing strategies.

Typically, when I want to do some funky merge between a large DS and a smaller one, as you've described above, I tend to use hash tables. Here's an alternative solution, again in a single step with significantly faster processing but using the less natural hash object syntax

data want(rename=(city_c=city_a));

     if _N_=1 then do;

     length city_a $40.; /* placeholder length, preferably use the same as the length of your city_a variable in your reference table */

     declare hash h(datasets: 'reftable');

     h.defineKey('city_a');

     h.defineData(); /* only necessary if you ought to retain additional columns from reference table in the merge */

     h.defineDone(); /* I have a blank this may or may not require the () for appropriate syntax */

     end;

     set have;

     if h.find(key: city2)=0 then city_c=city2;

     else if h.find(key:city1)=0 then city_c=city1;

     else city_c="ROC";

run;

Basically, hash table is a lookup table. RC is the return code of the find method on the hash table. It is 0 if the key was found, some code NE 0 otherwise. Hash table is all in memory which means great gains of efficiency for medium-large datasets. However, it is capped the RAM allocation of your computer to SAS. With a single key variable and no data you should have no problem get a hash table of 1M records though.

Vince

Patrick
Opal | Level 21

Just as a variation to Vince's code:

data want;
  if _N_=1 then
    do;
      if 0 then set reftable(keep=city_a);
      declare hash h(dataset: 'reftable');
      h.defineKey('city_a');
      h.defineData('city_a');
      h.defineDone();
    end;

  set have;

  if h.find(key: city2) ne 0 then if h.find(key:city1) ne 0 then city_a="ROC";
run;

podarum
Quartz | Level 8

Thank you all.. Great help..

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1792 views
  • 3 likes
  • 5 in conversation