BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
OLUGBOJI
Obsidian | Level 7

I am validating some datasets and I wish to look up postal codes in the postal_history dataset. The postal history dataset consists of postal codes for each observation for 10 years. I am trying to see if I can validate by matching postal codes. There are millions of records in the postal_history dataset so I do not want to do a join on

(Postal eq post1 or postal eq post2 or postal eq post3... ) for example. Is there a more efficient way to do this.

 

Table: postal_history

ID

Post1

Post2

Post3

Post4

Post5

Postal1

Postal2

Postal3

Postal4

Postal5

1

P2e3

P2e4

P2e5

P2e6

P2e7

P2e8

P2e9

P2e0

P2e4

P2e8

2

P2e7

P2e8

P2e4

P2e4

P2f6

P2e4

P2e8

P2e8

P2e8

P2f1

 

Table: invalid

id

postal

Sex

dob

Address1

phone

1

P2e6

m

09/06/04

f

123456

2

P2e9

m

09/06/05

g

654321

3

P2e8

f

09/06/06

h

789012

 

Table: want

id

postal

Sex

dob

Address1

phone

match

1

P2e6

m

09/06/04

f

123456

match

2

P2e9

m

09/06/05

g

654321

unmatch

3

P2e8

f

09/06/06

h

789012

match

 

have => postal_history and invalid tables

want=>want;

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data history;
input ID $	(Post1	Post2	Post3	Post4	Post5	Postal1	Postal2	Postal3	Postal4	Postal5) ($);
cards;
1	P2e3	P2e4	P2e5	P2e6	P2e7	P2e8	P2e9	P2e0	P2e4	P2e8
2	P2e7	P2e8	P2e4	P2e4	P2f6	P2e4	P2e8	P2e8	P2e8	P2f1
;
data invalid;
input (id	postal	Sex	dob	Address1	phone) ($);
cards;
1	P2e6	m	9/6/2004	f	123456
2	P2e9	m	9/6/2005	g	654321
3	P2e8	f	9/6/2006	h	789012
;
data want;
if _n_=1 then do;
if 0 then set invalid;
dcl hash H () ;
	h.definekey  ("id","post") ;
   h.definedone () ;
   length match $10 post $8;
   call missing(post);
   do until(lr);
   set history end=lr;
   array t(*) Post1--Postal5;
   do i= 1 to dim(t);
   post=t(i);
   rc=h.add();
   end;
   end;
end;
set invalid;
if h.check(key:id,key:postal)=0 then match='Match';
else  Match='No Match';
drop Post1--Postal5 post i rc;
run; 

@OLUGBOJI   Sounds like you want that in hash

View solution in original post

7 REPLIES 7
novinosrin
Tourmaline | Level 20

Not clear with the WANT there. Can you clarify your WANT output and the HAVEs plz?

andreas_lds
Jade | Level 19

Sorry, not clear what you want. And please post data in usable form.

andreas_lds
Jade | Level 19

untested code, assuming that both datasets are sorted by id and id is unique in both datasets:

 

data want;
  merge invalid(in=needsProcessing) postal_history;
  by id;

  length match $ 7;

  if needsProcessing;
  
  if findw(catx(' ', Post1, Post2, ...), Postal) then match ='match';
  else match = 'unmatch';

  drop Post1 Post 2 ....;
run;

 

OLUGBOJI
Obsidian | Level 7
Thanks. id is unique. Data was not sorted initially and using Hash object would not need sorting.
novinosrin
Tourmaline | Level 20
data history;
input ID $	(Post1	Post2	Post3	Post4	Post5	Postal1	Postal2	Postal3	Postal4	Postal5) ($);
cards;
1	P2e3	P2e4	P2e5	P2e6	P2e7	P2e8	P2e9	P2e0	P2e4	P2e8
2	P2e7	P2e8	P2e4	P2e4	P2f6	P2e4	P2e8	P2e8	P2e8	P2f1
;
data invalid;
input (id	postal	Sex	dob	Address1	phone) ($);
cards;
1	P2e6	m	9/6/2004	f	123456
2	P2e9	m	9/6/2005	g	654321
3	P2e8	f	9/6/2006	h	789012
;
data want;
if _n_=1 then do;
if 0 then set invalid;
dcl hash H () ;
	h.definekey  ("id","post") ;
   h.definedone () ;
   length match $10 post $8;
   call missing(post);
   do until(lr);
   set history end=lr;
   array t(*) Post1--Postal5;
   do i= 1 to dim(t);
   post=t(i);
   rc=h.add();
   end;
   end;
end;
set invalid;
if h.check(key:id,key:postal)=0 then match='Match';
else  Match='No Match';
drop Post1--Postal5 post i rc;
run; 

@OLUGBOJI   Sounds like you want that in hash

novinosrin
Tourmaline | Level 20

@OLUGBOJI   This is better i think

 


data history;
input ID $	(Post1	Post2	Post3	Post4	Post5	Postal1	Postal2	Postal3	Postal4	Postal5) ($);
cards;
1	P2e3	P2e4	P2e5	P2e6	P2e7	P2e8	P2e9	P2e0	P2e4	P2e8
2	P2e7	P2e8	P2e4	P2e4	P2f6	P2e4	P2e8	P2e8	P2e8	P2f1
;
data invalid;
input (id	postal	Sex	dob	Address1	phone) ($);
cards;
1	P2e6	m	9/6/2004	f	123456
2	P2e9	m	9/6/2005	g	654321
3	P2e8	f	9/6/2006	h	789012
;

data want;
if _n_=1 then do;
if 0 then set invalid;
dcl hash H (dataset:'invalid') ;
	h.definekey  ("id") ;
   h.definedata (all:'y') ;
   h.definedone();
   end;
set history;
array t(*) Post1--Postal5;
length match $10;
rc=h.find();
if postal in t then match='Match';
else Match='No Match';
drop rc Post1--Postal5;
run;
novinosrin
Tourmaline | Level 20

data history;
input ID $	(Post1	Post2	Post3	Post4	Post5	Postal1	Postal2	Postal3	Postal4	Postal5) ($);
cards;
1	P2e3	P2e4	P2e5	P2e6	P2e7	P2e8	P2e9	P2e0	P2e4	P2e8
2	P2e7	P2e8	P2e4	P2e4	P2f6	P2e4	P2e8	P2e8	P2e8	P2f1
;
data invalid;
input (id	postal	Sex	dob	Address1	phone) ($);
cards;
1	P2e6	m	9/6/2004	f	123456
2	P2e9	m	9/6/2005	g	654321
3	P2e8	f	9/6/2006	h	789012
;
data want;
merge invalid history;
by id;
array t(*) Post1--Postal5;
length match $10;
if postal in t then match='Match';
else Match='No Match';
run;

Note: ID3 is missing in history table

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1004 views
  • 2 likes
  • 3 in conversation