@MART1
Your "real world" explanation really helps to better understand the challenge. Some representative sample data provided via a SAS data step would likely get us even closer to your reality.
Below code based on what I found on the Internet how UK postcodes look like: https://ideal-postcodes.co.uk/guides/uk-postcode-format
The code syntax to extract the codes from a string based on SAS code sample from here.
Below code may not be fully suitable for your data but it should give you the necessary pointers ...and depending on your hardware the code should execute within minutes (or less).
...and I just realized that I've done things only for the Outward portion of the UK postcode. To change things for the full postcode shouldn't be that hard though. If you want SAS code for it then please provide representative sample data in the form of working data step code creating this data (the "users" and "postcode" tables).
data users;
infile datalines truncover dlm='|';
input user $ addr $40.;
datalines;
aa|aaad bbbd 123 xxx ab3x blah
aa|aaad bbbd 2 xxx c9 blah
aa|aaad bbbd xc9 xxx ab3x blah
aa|aaad bbbd c9 xxx ab3x blah
aa|aaad bbbd c999 xxx ab3xy blah
;
data postcodes;
input postcode $;
datalines;
ab3x
c9
;
data want;
if _n_=1 then
do;
if 0 then
set postcodes(keep=postcode);
dcl hash h1(dataset:'postcodes');
h1.defineKey('postcode');
h1.defineDone();
end;
call missing(of _all_);
set users;
array postcode_candidate_ {4} $4;
length _found $4;
_ExpressionID = prxparse('/\b[[:alpha:]]{2}\d[[:alpha:]]\b|\b[[:alpha:]]\d[[:alpha:]]\b|\b[[:alpha:]]\d\b|\b[[:alpha:]]\d\d\b|\b[[:alpha:]]{2}\d\b|\b[[:alpha:]]{2}\d\d\b/');
_start = 1;
_stop = length(addr);
/* Use PRXNEXT to find the first instance of the pattern, */
/* then use DO WHILE to find all further instances. */
/* PRXNEXT changes the _start parameter so that searching */
/* begins again after the last match. */
call prxnext(_ExpressionID, _start, _stop, addr, _position, _length);
do while (_position > 0);
_found = substr(addr, _position, _length);
if h1.check(key:_found)=0 then
do;
_i=sum(_i,1);
postcode_candidate_[_i]=_found;
end;
call prxnext(_ExpressionID, _start, _stop, addr, _position, _length);
end;
drop _:;
run;
... View more