I have a dataset which has account IDs (acct_id) and Account Names (acct_name). I have a pattern to figure out if the account exactly belongs to a customer. There are suffixes like JR or SR or I or V or MD which I don't need to take it to account for account name. I made up the data that almost matches my original data.
For example:
MICHAEL D KUELKAR JR Account Name has account id of KUELK2M6D.
The account ID is calculated like below
first 5 letters of last name (KUELK)+how many more letters are left in the last name (2 more letters, AR in this case)+
first letter of the first name(M as in Michael)+how many letters are left after the first letter of the first name (ICHAEL, 6 Letters)+first letter of middle name (D in this case)
If there are any last name are lower than 5 bytes (length less than 5), there will be periods (like .). Since the length of the last name is less than 5 letter, there will not be any letters left so we have 0 after the last name and periods.
If the middle name is missing then you would have . as well.
Like acct_name (PING-PONG LI/2121/) acct_id is LI...0P3P
I need to find the mismatch acct_name and acct_id.
Here is the code I tried, Please suggest me how to go forward from here. Thanks in advance.
PROC SQL;
create table WORK.HAVE ( bufsize=65536 )
(
acct_id char(27),
acct_name char(240)
);
QUIT;
PROC SQL;
INSERT INTO have
VALUES ("KING.0R6K","RICHARD K KING JR/1111/")
VALUES ("KO...0C4.","CAMEY JOE/2282/")
VALUES ("KUELK2M6D","MICHAEL D KUELKAR JR/1110/")
VALUES ("FUSS.0J4.","JACOB JOHN FUSS VI/8889/")
VALUES ("LI...0P3P", "PING-PONG LI/2121/")
VALUES ("LO...0T6Z","TSARA YEU ZEE LO/6666/")
VALUES ("LOYD.0W5.","WESLEY BRUCE LOYD SR/1181/")
VALUES ("LUCER7M4.","MARIA K LUCERA SR/3333/")
VALUES ("BUCKE2W0K","WK BUCKETT JR/1151/")
VALUES ("LUM..0R5H","ROBERTS H S/1141/")
VALUES ("M....0V4.","VUING DO/5595/")
VALUES ("MILLS0S4C","STEVE MILLS IV/9898/")
VALUES ("SUTTE1D3.","DARK N SUTTER M D/3331/")
VALUES ("ATKIN1R5D","ROBERT ATKINS SR/5656/")
VALUES ("ATKIN1R5D","ROBERT D ATKINS SR/1212/")
VALUES ("ANGUI9G7.","GAMELIEL S ANGUIAND SR/1897/")
VALUES ("ARNIT1J5M","JESSIE M ARNITT JR/4564/")
VALUES ("DAXTE1F4L","FRANK LEVIS DAXTER V/6781/")
VALUES ("BEASO1R2S","RON SHARE BEASON II/9872/")
VALUES ("CASTI6A8Z","ALEXA Z CASTILLEMOS SR/9861/")
VALUES ("CESTR1G6C","GERARDI CAMACHO-CESTRO IV/5671/")
VALUES ("BROWN0J5.","JACKIE BROWN M D/5671/")
VALUES ("CRAPS2D5J","CRAPSEN DIANIA JOE/0981/")
VALUES ("DEBAR6M4H","MERCY H DEE/0001/")
VALUES ("DEDIO1F4R","FELEX V DEDIOS JR/0001/")
VALUES ("KRIEK6S3T","SEAN THOMAS KREIKEMAIAR/0001/")
VALUES ("CASTR1G6C","GERARDI CAMACHI-CASTRO IV/0001/")
;
QUIT;
DATA HAVE;
SET HAVE;
NEW_VAR=SCAN(acct_name,1,'/');
RUN;
data want ;
set have;
do i=1 to countw(new_var,' ');
var1=scan(new_var,i,' ');
output;
end;
drop i new_var ;
run;
proc sql;
create table want1 as
select *
from want
where acct_id not in
(select distinct acct_id
from want
where strip(compress(substr(acct_id,1,5),,'p')) in (select var1 from want) )
;
quit;
Thanks
Thank you for providing usable data, even if it seems imperfect.
This should get you started:
data WANT ;
set HAVE;
STR=scan(ACCT_NAME,1,'/');
STR=prxchange('s/\b(I|V|MD|JR|SR|M|II)\b//',-1,STR);
FN =scan(STR,1);
N2 =scan(STR,2);
N3 =scan(STR,3);
if N3 ne ' ' then do;
LN=N3;
MN=N2;
end;
else LN=N2;
STR2=cats(substr(LN,1,5)
,substrn('....',1,5-length(LN))
,max(length(LN)-5,0)
,first(FN)
,length(FN)-1
,first(MN||'.')
);
if STR2 ne ACCT_ID;
run;
Obs | acct_id | acct_name | STR2 |
---|---|---|---|
1 | KO...0C4. | CAMEY JOE/2282/ | JOE..0C4 |
2 | FUSS.0J4. | JACOB JOHN FUSS VI/8889/ | FUSS.0J4J |
3 | LO...0T6Z | TSARA YEU ZEE LO/6666/ | ZEE..0T4Y |
4 | LOYD.0W5. | WESLEY BRUCE LOYD SR/1181/ | LOYD.0W5B |
5 | LUCER7M4. | MARIA K LUCERA SR/3333/ | LUCER1M4K |
6 | BUCKE2W0K | WK BUCKETT JR/1151/ | BUCKE2W1 |
7 | LUM..0R5H | ROBERTS H S/1141/ | S....0R6H |
8 | M....0V4. | VUING DO/5595/ | DO...0V4 |
9 | MILLS0S4C | STEVE MILLS IV/9898/ | IV...0S4M |
10 | SUTTE1D3. | DARK N SUTTER M D/3331/ | SUTTE1D3N |
11 | ATKIN1R5D | ROBERT ATKINS SR/5656/ | ATKIN1R5 |
12 | ANGUI9G7. | GAMELIEL S ANGUIAND SR/1897/ | ANGUI3G7S |
13 | ARNIT1J5M | JESSIE M ARNITT JR/4564/ | ARNIT1J5 |
14 | CASTI6A8Z | ALEXA Z CASTILLEMOS SR/9861/ | CASTI6A4Z |
15 | BROWN0J5. | JACKIE BROWN M D/5671/ | D....0J5B |
16 | CRAPS2D5J | CRAPSEN DIANIA JOE/0981/ | JOE..0C6D |
17 | DEBAR6M4H | MERCY H DEE/0001/ | DEE..0M4H |
18 | DEDIO1F4R | FELEX V DEDIOS JR/0001/ | DEDIO1F4 |
19 | KRIEK6S3T | SEAN THOMAS KREIKEMAIAR/0001/ | KREIK6S3T |
The output should have the below records only.
acct_id acct_name
KO...0C4. CAMEY JOE/2282/
LO...0T6Z TSARA YEU ZEE LO/6666/
LOYD.0W5. WESLEY BRUCE LOYD SR/1181/
LUCER7M4. MARIA K LUCERA SR/3333/
LUM..0R5H ROBERTS H S/1141/
M....0V4. VUING DO/5595/
MILLS0S4C STEVE MILLS IV/9898/
SUTTE1D3. DARK N SUTTER M D/3331/
ATKIN1R5D ROBERT ATKINS SR/5656/
ANGUI9G7. GAMELIEL S ANGUIAND SR/1897/
CASTI6A8Z ALEXA Z CASTILLEMOS SR/9861/
DEBAR6M4H MERCY H DEE/0001/
DEDIO1F4R FELEX V DEDIOS JR/0001/
Thank you for providing usable data, even if it seems imperfect.
This should get you started:
data WANT ;
set HAVE;
STR=scan(ACCT_NAME,1,'/');
STR=prxchange('s/\b(I|V|MD|JR|SR|M|II)\b//',-1,STR);
FN =scan(STR,1);
N2 =scan(STR,2);
N3 =scan(STR,3);
if N3 ne ' ' then do;
LN=N3;
MN=N2;
end;
else LN=N2;
STR2=cats(substr(LN,1,5)
,substrn('....',1,5-length(LN))
,max(length(LN)-5,0)
,first(FN)
,length(FN)-1
,first(MN||'.')
);
if STR2 ne ACCT_ID;
run;
Obs | acct_id | acct_name | STR2 |
---|---|---|---|
1 | KO...0C4. | CAMEY JOE/2282/ | JOE..0C4 |
2 | FUSS.0J4. | JACOB JOHN FUSS VI/8889/ | FUSS.0J4J |
3 | LO...0T6Z | TSARA YEU ZEE LO/6666/ | ZEE..0T4Y |
4 | LOYD.0W5. | WESLEY BRUCE LOYD SR/1181/ | LOYD.0W5B |
5 | LUCER7M4. | MARIA K LUCERA SR/3333/ | LUCER1M4K |
6 | BUCKE2W0K | WK BUCKETT JR/1151/ | BUCKE2W1 |
7 | LUM..0R5H | ROBERTS H S/1141/ | S....0R6H |
8 | M....0V4. | VUING DO/5595/ | DO...0V4 |
9 | MILLS0S4C | STEVE MILLS IV/9898/ | IV...0S4M |
10 | SUTTE1D3. | DARK N SUTTER M D/3331/ | SUTTE1D3N |
11 | ATKIN1R5D | ROBERT ATKINS SR/5656/ | ATKIN1R5 |
12 | ANGUI9G7. | GAMELIEL S ANGUIAND SR/1897/ | ANGUI3G7S |
13 | ARNIT1J5M | JESSIE M ARNITT JR/4564/ | ARNIT1J5 |
14 | CASTI6A8Z | ALEXA Z CASTILLEMOS SR/9861/ | CASTI6A4Z |
15 | BROWN0J5. | JACKIE BROWN M D/5671/ | D....0J5B |
16 | CRAPS2D5J | CRAPSEN DIANIA JOE/0981/ | JOE..0C6D |
17 | DEBAR6M4H | MERCY H DEE/0001/ | DEE..0M4H |
18 | DEDIO1F4R | FELEX V DEDIOS JR/0001/ | DEDIO1F4 |
19 | KRIEK6S3T | SEAN THOMAS KREIKEMAIAR/0001/ | KREIK6S3T |
ChrisNZ,
It is working for the records that have middle name, but if the middle name is missing then it is not picking the period in the end. For example:
Below example doesn't have middle name. So the str2 should have DO...0V4. but it shows as DO...0V4. How can we fix it please?
Thanks,
M....0V4. | VUING DO/5595/ | DO...0V4 |
Replace
,first(MN||'.')
with
,coalecsec(first(MN),'.')
?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.