11-15-2011 05:50 PM
Does anyone have a good idea on how to match an identifying field from one table to another table where the number may match any of 3 - 5 different identifying fields? I tried using Proc SQL with several OR filters, but it is running forever. I am already having to filter on a location field and date field as well and the table is very large They may also have multiple payments from that second table
11-15-2011 05:54 PM
Your question will be easier to answer if you indicate:
1. How large (# records) is each file
2. Are there multiple records, in each file, for given identifying fields?
3. Sample data sets for both files
4. The file you want as a result of merging the two files and
5. The code you've already tried
06-08-2012 06:40 PM
I am using a pass thru SQL query. The one file has 5 different identifying fields for individual, spouse, children, etc. The payment file I am trying to match to has only the one identifying field but it can match to any of the 5 identifying fields in the other table. I don't have the luxury of trying to get any of the data tables changed.
The payment file has at least 20 million records and the other file has about 500k records...
06-08-2012 06:59 PM
try an SQL join with a where clause like
where find( '/'!! catx('/', b.name1, b.name2, b.name3, b.name4, b.name5)!!'/', '/'!! trim(a.name)!!'/', 'i' )
the syntax seeks to find NAME in file A anywhere in the concatenation of the 5 NAMEs in file B
The / delimiter is designed to protect against A.NAME matching a combination or substring of the B.names
The 'i' modifier is intended to make the search insensitive to case
I haven't tested this so be cautious
11-16-2011 05:05 AM
In what format is your data stored?
You could consider moving it to SPDE, since it can better optimize OR filters together with indexes (as an addition to Arts wish list, how/are your tables indexed?).
It seems likely that your data needs restructuring.
06-11-2012 03:52 AM
You may try collecting different id variables under one variable. 500K obs would go up to 2500K obs (500 x 5).
connect to odbc(dsn=<yourdsnname>);
create table x as
from connection to odbc(
from largetable a
(select id1 id from smalltable
select id2 id from smalltable
select id3 id from smalltable
select id4 id from smalltable
select id5 id from smalltable
) b on a.id = b.id
disconnect from odbc;