BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
OLUGBOJI
Obsidian | Level 7

I am validating some datasets and I wish to look up postal codes in the postal_history dataset. The postal history dataset consists of postal codes for each observation for 10 years. I am trying to see if I can validate by matching postal codes. There are millions of records in the postal_history dataset so I do not want to do a join on

(Postal eq post1 or postal eq post2 or postal eq post3... ) for example. Is there a more efficient way to do this.

 

Table: postal_history

ID

Post1

Post2

Post3

Post4

Post5

Postal1

Postal2

Postal3

Postal4

Postal5

1

P2e3

P2e4

P2e5

P2e6

P2e7

P2e8

P2e9

P2e0

P2e4

P2e8

2

P2e7

P2e8

P2e4

P2e4

P2f6

P2e4

P2e8

P2e8

P2e8

P2f1

 

Table: invalid

id

postal

Sex

dob

Address1

phone

1

P2e6

m

09/06/04

f

123456

2

P2e9

m

09/06/05

g

654321

3

P2e8

f

09/06/06

h

789012

 

Table: want

id

postal

Sex

dob

Address1

phone

match

1

P2e6

m

09/06/04

f

123456

match

2

P2e9

m

09/06/05

g

654321

unmatch

3

P2e8

f

09/06/06

h

789012

match

 

have => postal_history and invalid tables

want=>want;

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data history;
input ID $	(Post1	Post2	Post3	Post4	Post5	Postal1	Postal2	Postal3	Postal4	Postal5) ($);
cards;
1	P2e3	P2e4	P2e5	P2e6	P2e7	P2e8	P2e9	P2e0	P2e4	P2e8
2	P2e7	P2e8	P2e4	P2e4	P2f6	P2e4	P2e8	P2e8	P2e8	P2f1
;
data invalid;
input (id	postal	Sex	dob	Address1	phone) ($);
cards;
1	P2e6	m	9/6/2004	f	123456
2	P2e9	m	9/6/2005	g	654321
3	P2e8	f	9/6/2006	h	789012
;
data want;
if _n_=1 then do;
if 0 then set invalid;
dcl hash H () ;
	h.definekey  ("id","post") ;
   h.definedone () ;
   length match $10 post $8;
   call missing(post);
   do until(lr);
   set history end=lr;
   array t(*) Post1--Postal5;
   do i= 1 to dim(t);
   post=t(i);
   rc=h.add();
   end;
   end;
end;
set invalid;
if h.check(key:id,key:postal)=0 then match='Match';
else  Match='No Match';
drop Post1--Postal5 post i rc;
run; 

@OLUGBOJI   Sounds like you want that in hash

View solution in original post

7 REPLIES 7
novinosrin
Tourmaline | Level 20

Not clear with the WANT there. Can you clarify your WANT output and the HAVEs plz?

andreas_lds
Jade | Level 19

Sorry, not clear what you want. And please post data in usable form.

andreas_lds
Jade | Level 19

untested code, assuming that both datasets are sorted by id and id is unique in both datasets:

 

data want;
  merge invalid(in=needsProcessing) postal_history;
  by id;

  length match $ 7;

  if needsProcessing;
  
  if findw(catx(' ', Post1, Post2, ...), Postal) then match ='match';
  else match = 'unmatch';

  drop Post1 Post 2 ....;
run;

 

OLUGBOJI
Obsidian | Level 7
Thanks. id is unique. Data was not sorted initially and using Hash object would not need sorting.
novinosrin
Tourmaline | Level 20
data history;
input ID $	(Post1	Post2	Post3	Post4	Post5	Postal1	Postal2	Postal3	Postal4	Postal5) ($);
cards;
1	P2e3	P2e4	P2e5	P2e6	P2e7	P2e8	P2e9	P2e0	P2e4	P2e8
2	P2e7	P2e8	P2e4	P2e4	P2f6	P2e4	P2e8	P2e8	P2e8	P2f1
;
data invalid;
input (id	postal	Sex	dob	Address1	phone) ($);
cards;
1	P2e6	m	9/6/2004	f	123456
2	P2e9	m	9/6/2005	g	654321
3	P2e8	f	9/6/2006	h	789012
;
data want;
if _n_=1 then do;
if 0 then set invalid;
dcl hash H () ;
	h.definekey  ("id","post") ;
   h.definedone () ;
   length match $10 post $8;
   call missing(post);
   do until(lr);
   set history end=lr;
   array t(*) Post1--Postal5;
   do i= 1 to dim(t);
   post=t(i);
   rc=h.add();
   end;
   end;
end;
set invalid;
if h.check(key:id,key:postal)=0 then match='Match';
else  Match='No Match';
drop Post1--Postal5 post i rc;
run; 

@OLUGBOJI   Sounds like you want that in hash

novinosrin
Tourmaline | Level 20

@OLUGBOJI   This is better i think

 


data history;
input ID $	(Post1	Post2	Post3	Post4	Post5	Postal1	Postal2	Postal3	Postal4	Postal5) ($);
cards;
1	P2e3	P2e4	P2e5	P2e6	P2e7	P2e8	P2e9	P2e0	P2e4	P2e8
2	P2e7	P2e8	P2e4	P2e4	P2f6	P2e4	P2e8	P2e8	P2e8	P2f1
;
data invalid;
input (id	postal	Sex	dob	Address1	phone) ($);
cards;
1	P2e6	m	9/6/2004	f	123456
2	P2e9	m	9/6/2005	g	654321
3	P2e8	f	9/6/2006	h	789012
;

data want;
if _n_=1 then do;
if 0 then set invalid;
dcl hash H (dataset:'invalid') ;
	h.definekey  ("id") ;
   h.definedata (all:'y') ;
   h.definedone();
   end;
set history;
array t(*) Post1--Postal5;
length match $10;
rc=h.find();
if postal in t then match='Match';
else Match='No Match';
drop rc Post1--Postal5;
run;
novinosrin
Tourmaline | Level 20

data history;
input ID $	(Post1	Post2	Post3	Post4	Post5	Postal1	Postal2	Postal3	Postal4	Postal5) ($);
cards;
1	P2e3	P2e4	P2e5	P2e6	P2e7	P2e8	P2e9	P2e0	P2e4	P2e8
2	P2e7	P2e8	P2e4	P2e4	P2f6	P2e4	P2e8	P2e8	P2e8	P2f1
;
data invalid;
input (id	postal	Sex	dob	Address1	phone) ($);
cards;
1	P2e6	m	9/6/2004	f	123456
2	P2e9	m	9/6/2005	g	654321
3	P2e8	f	9/6/2006	h	789012
;
data want;
merge invalid history;
by id;
array t(*) Post1--Postal5;
length match $10;
if postal in t then match='Match';
else Match='No Match';
run;

Note: ID3 is missing in history table