Help using Base SAS procedures

Matching payments to people

Reply
Contributor
Posts: 20

Matching payments to people

Does anyone have a good idea on how to match an identifying field from one table to another table where the number may match any of 3 - 5 different identifying fields?  I tried using Proc SQL with several OR filters, but it is running forever.  I am already having to filter on a location field and date field as well and the table is very large Smiley Sad  They may also have multiple payments from that second table Smiley Sad

PROC Star
Posts: 7,366

Matching payments to people

Your question will be easier to answer if you indicate:

1. How large (# records) is each file

2. Are there multiple records, in each file, for given identifying fields?

3. Sample data sets for both files

4. The file you want as a result of merging the two files and

5. The code you've already tried

Contributor
Posts: 20

Re: Matching payments to people


I am using a pass thru SQL query. The one file has 5 different identifying fields for individual, spouse, children, etc.  The payment file I am trying to match to has only the one identifying field but it can match to any of the 5 identifying fields in the other table.  I don't have the luxury of trying to get any of the data tables changed.

The payment file has at least 20 million records and the other file has about 500k records...

Valued Guide
Posts: 2,175

Re: Matching payments to people

try an SQL join with a where clause like

where find( '/'!! catx('/', b.name1, b.name2, b.name3, b.name4, b.name5)!!'/', '/'!! trim(a.name)!!'/', 'i' )

the syntax seeks to find NAME in file A anywhere in the concatenation of the 5 NAMEs in file B

The / delimiter is designed to protect against A.NAME matching a combination or substring of the B.names

The 'i' modifier is intended to make the search insensitive to case

I haven't tested this so be cautious

peterC

Super User
Posts: 5,260

Matching payments to people

In what format is your data stored?

You could consider moving it to SPDE, since it can better optimize OR filters together with indexes (as an addition to Arts wish list, how/are your tables indexed?).

It seems likely that your data needs restructuring.

/Linus

Data never sleeps
Valued Guide
Posts: 2,175

Matching payments to people

weight each "join" and use the total of these to indicate the likelihood of the match

Super User
Posts: 10,550

Matching payments to people

Without example code it is hard to optimize.

First thing is too make sure you are using things like LEFT JOIN instead of simple JOIN.

Super User
Posts: 9,691

Re: Matching payments to people

Without sample data and the output you want, it is hard to give you some advice.

Frequent Contributor
Posts: 95

Re: Matching payments to people

You may try collecting different id variables under one variable. 500K obs would go up to 2500K obs (500 x 5).

proc sql;

    connect to odbc(dsn=<yourdsnname>);

    create table x as

    select *

    from connection to odbc(

        select a.*

        from largetable a

        inner join

        (select id1 id from smalltable

         UNION ALL

         select id2 id from smalltable

         UNION ALL

         select id3 id from smalltable

         UNION ALL

         select id4 id from smalltable

         UNION ALL

         select id5 id from smalltable

        ) b on a.id = b.id

    );

    disconnect from odbc;

quit;

Ask a Question
Discussion stats
  • 8 replies
  • 251 views
  • 0 likes
  • 7 in conversation