I am validating some datasets and I wish to look up postal codes in the postal_history dataset. The postal history dataset consists of postal codes for each observation for 10 years. I am trying to see if I can validate by matching postal codes. There are millions of records in the postal_history dataset so I do not want to do a join on
(Postal eq post1 or postal eq post2 or postal eq post3... ) for example. Is there a more efficient way to do this.
Table: postal_history
ID | Post1 | Post2 | Post3 | Post4 | Post5 | Postal1 | Postal2 | Postal3 | Postal4 | Postal5 |
1 | P2e3 | P2e4 | P2e5 | P2e6 | P2e7 | P2e8 | P2e9 | P2e0 | P2e4 | P2e8 |
2 | P2e7 | P2e8 | P2e4 | P2e4 | P2f6 | P2e4 | P2e8 | P2e8 | P2e8 | P2f1 |
Table: invalid
id | postal | Sex | dob | Address1 | phone |
1 | P2e6 | m | 09/06/04 | f | 123456 |
2 | P2e9 | m | 09/06/05 | g | 654321 |
3 | P2e8 | f | 09/06/06 | h | 789012 |
Table: want
id | postal | Sex | dob | Address1 | phone | match |
1 | P2e6 | m | 09/06/04 | f | 123456 | match |
2 | P2e9 | m | 09/06/05 | g | 654321 | unmatch |
3 | P2e8 | f | 09/06/06 | h | 789012 | match |
have => postal_history and invalid tables
want=>want;
data history;
input ID $ (Post1 Post2 Post3 Post4 Post5 Postal1 Postal2 Postal3 Postal4 Postal5) ($);
cards;
1 P2e3 P2e4 P2e5 P2e6 P2e7 P2e8 P2e9 P2e0 P2e4 P2e8
2 P2e7 P2e8 P2e4 P2e4 P2f6 P2e4 P2e8 P2e8 P2e8 P2f1
;
data invalid;
input (id postal Sex dob Address1 phone) ($);
cards;
1 P2e6 m 9/6/2004 f 123456
2 P2e9 m 9/6/2005 g 654321
3 P2e8 f 9/6/2006 h 789012
;
data want;
if _n_=1 then do;
if 0 then set invalid;
dcl hash H () ;
h.definekey ("id","post") ;
h.definedone () ;
length match $10 post $8;
call missing(post);
do until(lr);
set history end=lr;
array t(*) Post1--Postal5;
do i= 1 to dim(t);
post=t(i);
rc=h.add();
end;
end;
end;
set invalid;
if h.check(key:id,key:postal)=0 then match='Match';
else Match='No Match';
drop Post1--Postal5 post i rc;
run;
@OLUGBOJI Sounds like you want that in hash
Not clear with the WANT there. Can you clarify your WANT output and the HAVEs plz?
Sorry, not clear what you want. And please post data in usable form.
untested code, assuming that both datasets are sorted by id and id is unique in both datasets:
data want;
merge invalid(in=needsProcessing) postal_history;
by id;
length match $ 7;
if needsProcessing;
if findw(catx(' ', Post1, Post2, ...), Postal) then match ='match';
else match = 'unmatch';
drop Post1 Post 2 ....;
run;
data history;
input ID $ (Post1 Post2 Post3 Post4 Post5 Postal1 Postal2 Postal3 Postal4 Postal5) ($);
cards;
1 P2e3 P2e4 P2e5 P2e6 P2e7 P2e8 P2e9 P2e0 P2e4 P2e8
2 P2e7 P2e8 P2e4 P2e4 P2f6 P2e4 P2e8 P2e8 P2e8 P2f1
;
data invalid;
input (id postal Sex dob Address1 phone) ($);
cards;
1 P2e6 m 9/6/2004 f 123456
2 P2e9 m 9/6/2005 g 654321
3 P2e8 f 9/6/2006 h 789012
;
data want;
if _n_=1 then do;
if 0 then set invalid;
dcl hash H () ;
h.definekey ("id","post") ;
h.definedone () ;
length match $10 post $8;
call missing(post);
do until(lr);
set history end=lr;
array t(*) Post1--Postal5;
do i= 1 to dim(t);
post=t(i);
rc=h.add();
end;
end;
end;
set invalid;
if h.check(key:id,key:postal)=0 then match='Match';
else Match='No Match';
drop Post1--Postal5 post i rc;
run;
@OLUGBOJI Sounds like you want that in hash
@OLUGBOJI This is better i think
data history;
input ID $ (Post1 Post2 Post3 Post4 Post5 Postal1 Postal2 Postal3 Postal4 Postal5) ($);
cards;
1 P2e3 P2e4 P2e5 P2e6 P2e7 P2e8 P2e9 P2e0 P2e4 P2e8
2 P2e7 P2e8 P2e4 P2e4 P2f6 P2e4 P2e8 P2e8 P2e8 P2f1
;
data invalid;
input (id postal Sex dob Address1 phone) ($);
cards;
1 P2e6 m 9/6/2004 f 123456
2 P2e9 m 9/6/2005 g 654321
3 P2e8 f 9/6/2006 h 789012
;
data want;
if _n_=1 then do;
if 0 then set invalid;
dcl hash H (dataset:'invalid') ;
h.definekey ("id") ;
h.definedata (all:'y') ;
h.definedone();
end;
set history;
array t(*) Post1--Postal5;
length match $10;
rc=h.find();
if postal in t then match='Match';
else Match='No Match';
drop rc Post1--Postal5;
run;
data history;
input ID $ (Post1 Post2 Post3 Post4 Post5 Postal1 Postal2 Postal3 Postal4 Postal5) ($);
cards;
1 P2e3 P2e4 P2e5 P2e6 P2e7 P2e8 P2e9 P2e0 P2e4 P2e8
2 P2e7 P2e8 P2e4 P2e4 P2f6 P2e4 P2e8 P2e8 P2e8 P2f1
;
data invalid;
input (id postal Sex dob Address1 phone) ($);
cards;
1 P2e6 m 9/6/2004 f 123456
2 P2e9 m 9/6/2005 g 654321
3 P2e8 f 9/6/2006 h 789012
;
data want;
merge invalid history;
by id;
array t(*) Post1--Postal5;
length match $10;
if postal in t then match='Match';
else Match='No Match';
run;
Note: ID3 is missing in history table
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.