I have two tables: TABLE_A and TABLE_B
Both tables TABLE_A and TABLE_B are updated daily. I’d like to determine which observations in TABLE_A are “affected” by updates in TABLE_B. This is very specific business requirement. What makes it challenging is that there is no key that can be used to join the tables directly. The PROC SQL I have takes more than half hour to complete. See the code below. Wondering if anyone has a better solution.
Note on the columns. Let’s say TABLE_B has a column called code_1, while TABLE_A has a corresponding column code_1_list. Values in code_1 column are alpha-numerical, while code_1_list values are quoted, and comma separated. For example:
TABLE_B
code_1 |
code1_A |
TABLE_A:
code_1_list |
‘code1_A’, ‘code1_B’, ‘code1_C’ |
I need to determine: if *any* observation in TABLE_B is in the list of corresponding column in TABLE_A, I need to include it (or keep/identify it).
By the way, both tables TABLE_A and TABLE_B have 25 of these corresponding columns. Both tables TABLE_A and TABLE_B have ~15k observations on any given day.
Here is a limited example (added notes for clarification):
TABLE_B
ID | code_1 | code_2 | code_3 | code_4 | Note (corresponding to TABLE_A) |
123 | code1_A | code2_B | code3_W | code4_D | In ID: A1, code_1_list and A3, code_2_list |
124 | code1_E | code2_B | code3_P | code4_R | In ID: A3, code_2_list |
125 | code1_H | code2_J | code3_K | code4_E |
|
126 | code1_A | code2_P | code3_A | code4_K | In ID: A1, code_1_list |
TABLE_A
ID | code_1_list | code_2_list | code_3_list | code_4_list | note |
A1 | 'code1_A', 'code1_C' | 'code2_A', 'code2_K', 'code2_G' | 'code3_F' | 'code4_A', 'code4_L' | keep: ID 123, code_1 in TABLE_B |
A2 | 'code1_B', 'code1_C', 'code1_J' | 'code2_A', 'code2_G' | 'code3_L' | 'code4_I', 'code4_R' |
|
A3 | 'code1_A' | 'code2_B', 'code2_P' | 'code3_F' | 'code4_A', 'code4_M' | keep: ID 123, code_1 and |
From the example above, only observations A1 and A3 are kept in TABLE_A.
Your thoughts? It seems there might be a much simpler solution that is escaping me...
SQL:
proc sql;
create table
work.table_ab as
select distinct
A.*
from
work.TABLE_A A,
work.TABLE_B B
where
(
A.code_1_list like (cats("'%",B.code_1,"%'"))
or
A.code_2_list like (cats("'%",B.code_2,"%'"))
or
A.code_3_list like (cats("'%",B.code_3,"%'"))
or
A.code_4_list like (cats("'%",B.code_4,"%'"))
)
;
quit;
This could be faster:
where index(A.code_1_list ,B.code_1)
or index(A.code_2_list ,B.code_2)
or index(A.code_3_list ,B.code_3)
or index(A.code_4_list ,B.code_4)
You might need to trim B values if the field is too long for its value
A data step solution using keyed lookup may be faster.
If your data looks something like this:
data b;
input id (code_1-code_4) ($);
cards;
123 code1_A code2_B code3_W code4_D
124 code1_E code2_B code3_P code4_R
125 code1_H code2_J code3_K code4_E
126 code1_A code2_P code3_A code4_K
;run;
data a;
infile cards dsd delimiter=';' truncover;
length ID $2 code_1_list code_2_list code_3_list $20;
input ID--code_3_list;
cards4;
A1;'code1_A', 'code1_C';'code2_A', 'code2_K', 'code2_G';'code3_F'
A2;'code1_B', 'code1_C', 'code1_J';code2_A', 'code2_G';'code3_L'
A3;'code1_A';'code2_B', 'code2_P';'code3_F';
;;;;run;
You first transpose and index the codes in B:
data codes(index=(idx=(num code)) keep=num code);
set b;
array codes(*) code_:;
do num=1 to dim(codes);
code=codes(num);
output;
end;
run;
Then you use keyed lookup to check the A dataset:
data want;
set a;
array lists(*) code_:;
do num=1 to dim(lists);
do _N_=1 to countw(lists(num));
code=scan(lists(num),_N_,"', ");
set codes key=idx/unique;
if not _iorc_ then do;
output;
_error_=0;
delete;
end;
end;
end;
_error_=0;
drop num code;
run;
The "_error_=0" statement comes in because whenever a code is not found, it generates an "error". Which you probably do not want to see in the log.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.