Help using Base SAS procedures

Merge by different fields on a priority

Reply
Super Contributor
Posts: 401

Merge by different fields on a priority

Hi, I'm hoping someone can help showing me how to merge 2 tables baed on different fields on a priority.. Here's what I mean.  I havea tabel with a large dataset (Table1) and 2 fields  Call them City1 and City2.. and another reference table that has City_A which could fall within either City1 or City2 or netiher in Table1..

Have (Table1):

City1      City2
Tor          Tor

Mon        Sch

Ott          Bra

Tor          Sca

Tor          Ajx

Tor          Sel

Van         Win

ROC       Tor

ROC       Chr

ROC       ROC

Reference Table:

City_A

Tor

Mon

Bra

Sel

Chr

Ott

Want a table where it takes the City2 priority if it's matched in Reference Table, OR if not, take City1 if it's matched, and if not leave ROC:

Looks like this:

City1      City2       City_A

Tor          Tor           Tor

Mon        Sch          Mon

Ott          Bra           Bra

Tor          Sca          Tor

Tor          Ajx           Tor

Tor          Sel           Sel

Van         Win         ROC

ROC       Tor           Tor

ROC       Chr           Chr

ROC       ROC         ROC

Thanks much....

Super User
Posts: 5,511

Re: Merge by different fields on a priority

One approach would create a format based on the reference table:

data ref;

  set reference_table;

  fmtname='$city';

  label='found';

  start=City_A;

run;

proc format cntlin=ref;

run;

Then a DATA step could use the format to look up matches:

data want;

set have;

if put(city2, $city.)='found' then City_A = city2;

else if put(city1, $city.)='found' then City_A=city1;

else city_A='ROC';

run;

Good luck.

Super Contributor
Posts: 307

Re: Merge by different fields on a priority

Posted in reply to Astounding

Astounding's approach is simple and effective.

You could also do something like:


proc sql;
select distinct city_a into :refer separated by ' '
from reference;
quit ;

data want ;
set have ;
refer_list = "&refer";
if indexw ( refer_list , city2 ) > 0 then city_a=city2;
else if indexw ( refer_list , city1 ) > 0 then city_a=city1;
else city_a = "ROC";
run;

Super Contributor
Posts: 339

Re: Merge by different fields on a priority


If your reference table doesn't have millions of records, in can all be done in a single sql query as follow

proc sql;

     create table want as

     select a.city1, a.city2, (case

                                        when a.city2 in (select distinct city_a from reftable) then city2

                                        when (a.city2 not in (select distinct city_a from reftable) and a.city1 in (select distinct city_a from reftable)) then city1

                                        else "ROC"

                                        end) as city_A

     from have;

quit;

Above would be the code with the best readability but if the (select distinct city_a from reftable) returns hundred of thousands of records, then the "in" operator lags significantly behind other processing strategies.

Typically, when I want to do some funky merge between a large DS and a smaller one, as you've described above, I tend to use hash tables. Here's an alternative solution, again in a single step with significantly faster processing but using the less natural hash object syntax

data want(rename=(city_c=city_a));

     if _N_=1 then do;

     length city_a $40.; /* placeholder length, preferably use the same as the length of your city_a variable in your reference table */

     declare hash h(datasets: 'reftable');

     h.defineKey('city_a');

     h.defineData(); /* only necessary if you ought to retain additional columns from reference table in the merge */

     h.defineDone(); /* I have a blank this may or may not require the () for appropriate syntax */

     end;

     set have;

     if h.find(key: city2)=0 then city_c=city2;

     else if h.find(key:city1)=0 then city_c=city1;

     else city_c="ROC";

run;

Basically, hash table is a lookup table. RC is the return code of the find method on the hash table. It is 0 if the key was found, some code NE 0 otherwise. Hash table is all in memory which means great gains of efficiency for medium-large datasets. However, it is capped the RAM allocation of your computer to SAS. With a single key variable and no data you should have no problem get a hash table of 1M records though.

Vince

Respected Advisor
Posts: 4,173

Re: Merge by different fields on a priority

Posted in reply to Vince28_Statcan

Just as a variation to Vince's code:

data want;
  if _N_=1 then
    do;
      if 0 then set reftable(keep=city_a);
      declare hash h(dataset: 'reftable');
      h.defineKey('city_a');
      h.defineData('city_a');
      h.defineDone();
    end;

  set have;

  if h.find(key: city2) ne 0 then if h.find(key:city1) ne 0 then city_a="ROC";
run;

Super Contributor
Posts: 401

Re: Merge by different fields on a priority

Thank you all.. Great help..

Ask a Question
Discussion stats
  • 5 replies
  • 319 views
  • 3 likes
  • 5 in conversation