Learning SAS? Welcome to the exclusive online community for all SAS learners.

Merging two tables based on columns range matching

Reply
New Contributor
Posts: 4

Merging two tables based on columns range matching

Hello everyone,

I want to merge two tables, having IP address from Table A matching based on a range from two IP address columns in Table B. 

For example,

Table A:

IP_Address         Domain                    Date

12345678             $                              $

52346875             $                              $

21356874             $                              $

Table B:

IP_Address_Upperboud                              IP_Address_Lowerboud                               City                      State

00000000                                                       20354897                                             Orlando                     FL

21054678                                                       35480956                                             Miami                        FL

4650996                                                          658645413                                          Atlanta                      GA

I need to take each row of  IP_Address in Table A and determine if it falls in the range of IP_Address_Upperboud  and IP_Address_Lowerboud. If it does then I need to join the City and State to Table A.

In the example above, we will take IP_Address from Table A, 12345678 and determine if it fall between IP_Address_Upperbound and IP_Address_Lowerboud. Since it does, the Orlando and FL will be merged to Table A. If a match is not found, a missing value will be given.

Thank you for your help!

Super User
Posts: 17,836

Re: Merging two tables based on columns range matching

You have a couple of methods, depending on the size of your table one may be better than another.

A SQL LEFT JOIN with a BETWEEN or Proc FormatX2 would be useful.

proc sql;

create table want as

select a.IP, b.City, b.State

from table1 as a

left join table2 as b

on a.ip between b.ip_address_upper_bound and b.ip_address_lower_bound

order by a.ip;

quit;

New Contributor
Posts: 4

Re: Merging two tables based on columns range matching

I receive the following error message

NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.

NOTE: Table WORK.SAS_TEST created, with 644662 rows and 24 columns.

Respected Advisor
Posts: 3,124

Re: Merging two tables based on columns range matching

Where are the ERROR messages? All I can see are NOTE.

New Contributor
Posts: 4

Re: Merging two tables based on columns range matching

Sorry, I only received the NOTE. Will this have an impact on the results?

Respected Advisor
Posts: 3,124

Re: Merging two tables based on columns range matching

I don't think so. Everything seems to be the way they should. Inner join is a sort of Cartesian product join, and SAS was just telling you that.

Super User
Posts: 10,500

Re: Merging two tables based on columns range matching

This might have some bearing on your situation: SAS/GRAPH(R) 9.3: Reference, Third Edition

Super Contributor
Posts: 254

Re: Merging two tables based on columns range matching

I want to make sure whether your IP_Address is a String or Number?

New Contributor
Posts: 4

Re: Merging two tables based on columns range matching

It was originally character but I converted it to numeric

Super Contributor
Posts: 254

Re: Merging two tables based on columns range matching

Here is a Data Step Approach.

IP_Address is taken as Number.

data havea;

input IP_Address Domain $1. @13 Date $1.;

datalines;

12345678 A  B

52346875 C  D

21356874 D  E

9999999999 H  W

;

run;

data haveb;

input IP_U  @12 IP_L  @23 City $8. @33 State $2.;

datalines;

0          20354897   Orlando   FL

21054678   35480956   Miami     FL

4650996    658645413  Atlanta   GA

;

run;

data want;

   if _n_ = 1 then do;

      if 0 then set havea(keep = IP_Address);

      declare hash h(dataset:'havea', ordered:'a');

      h.definekey('IP_Address');

      h.definedone();

      declare hiter hi('h');

   end;

   do while(hi.next() = 0);

      found = 0;

      do i = 1 to num;

         set haveb nobs = num point = i;

         if IP_U <= IP_Address <= IP_L then do; output; found = 1; end;

      end;

      if found = 0 then do; call missing(City, State); output; end;

   end;

   stop;

   keep IP_Address City State;

run;

Respected Advisor
Posts: 3,124

Re: Merging two tables based on columns range matching

Before everyone goes even further to reinvent the wheel unnecessarily,  it is highly recommended that you should take 's suggestion seriously. IP GEOCODING is supported by PROC GEOCODE, and it is the specialized tool designed for this purpose.

Haikuo

Super User
Posts: 17,836

Re: Merging two tables based on columns range matching

As long as you have SAS/GRAPH licensed Smiley Happy

There's an example of Proc GEOCODE for IP Addresses on this paper, page 9:

http://support.sas.com/resources/papers/proceedings09/087-2009.pdf

Super Contributor
Posts: 254

Re: Merging two tables based on columns range matching

I have provided a data step solution earlier.

On further reflection my solution has to lookup every record of the second data set(HAVEB) for each record of first data set(HAVEA). It will take enormous time if the number records runs to millions.

The Range (IP_U, IP_L) in the second data set is overlapping and hence each record of Second data set has to be looked up. Considerable run-time can be saved if the RANGE in the real data is non-overlapping by using other ways.

It is possible to save time with current data set(HAVEB) if the FIRST MATCH is sufficient.

You may post some additional information about your data set, particularly on the RANGE to save processing time.

Ask a Question
Discussion stats
  • 12 replies
  • 623 views
  • 2 likes
  • 5 in conversation