BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
buddha_d
Pyrite | Level 9

I have a dataset which has account IDs (acct_id) and Account Names (acct_name). I have a pattern to figure out if the account exactly belongs to a customer. There are suffixes like JR or SR or I or V or MD which I don't need to take it to account for account name. I made up the data that almost matches my original data.
For example:
MICHAEL D KUELKAR JR Account Name has account id of KUELK2M6D.
The account ID is calculated like below
first 5 letters of last name (KUELK)+how many more letters are left in the last name (2 more letters, AR in this case)+
first letter of the first name(M as in Michael)+how many letters are left after the first letter of the first name (ICHAEL, 6 Letters)+first letter of middle name (D in this case)

If there are any last name are lower than 5 bytes (length less than 5), there will be periods (like .). Since the length of the last name is less than 5 letter, there will not be any letters left so we have 0 after the last name and periods.
If the middle name is missing then you would have . as well.
Like acct_name (PING-PONG LI/2121/) acct_id is LI...0P3P

I need to find the mismatch acct_name and acct_id.

Here is the code I tried, Please suggest me how to go forward from here. Thanks in advance. 

 

PROC SQL;
create table WORK.HAVE ( bufsize=65536 )
  (
   acct_id char(27),
   acct_name char(240)
  );
QUIT;
PROC SQL;
INSERT INTO have
VALUES ("KING.0R6K","RICHARD K KING JR/1111/")
VALUES ("KO...0C4.","CAMEY JOE/2282/")
VALUES ("KUELK2M6D","MICHAEL D KUELKAR JR/1110/")
VALUES ("FUSS.0J4.","JACOB JOHN FUSS VI/8889/")
VALUES ("LI...0P3P", "PING-PONG LI/2121/")
VALUES ("LO...0T6Z","TSARA YEU ZEE LO/6666/")
VALUES ("LOYD.0W5.","WESLEY BRUCE LOYD SR/1181/")
VALUES ("LUCER7M4.","MARIA K LUCERA SR/3333/")
VALUES ("BUCKE2W0K","WK BUCKETT JR/1151/")
VALUES ("LUM..0R5H","ROBERTS H S/1141/")
VALUES ("M....0V4.","VUING DO/5595/")
VALUES ("MILLS0S4C","STEVE MILLS IV/9898/")
VALUES ("SUTTE1D3.","DARK N SUTTER M D/3331/")
VALUES ("ATKIN1R5D","ROBERT ATKINS SR/5656/")
VALUES ("ATKIN1R5D","ROBERT D ATKINS SR/1212/")
VALUES ("ANGUI9G7.","GAMELIEL S ANGUIAND SR/1897/")
VALUES ("ARNIT1J5M","JESSIE M ARNITT JR/4564/")
VALUES ("DAXTE1F4L","FRANK LEVIS DAXTER V/6781/")
VALUES ("BEASO1R2S","RON SHARE BEASON II/9872/")
VALUES ("CASTI6A8Z","ALEXA Z CASTILLEMOS SR/9861/")
VALUES ("CESTR1G6C","GERARDI CAMACHO-CESTRO IV/5671/")
VALUES ("BROWN0J5.","JACKIE BROWN M D/5671/")
VALUES ("CRAPS2D5J","CRAPSEN DIANIA JOE/0981/")
VALUES ("DEBAR6M4H","MERCY H DEE/0001/")
VALUES ("DEDIO1F4R","FELEX V DEDIOS JR/0001/")
VALUES ("KRIEK6S3T","SEAN THOMAS KREIKEMAIAR/0001/")
VALUES ("CASTR1G6C","GERARDI CAMACHI-CASTRO IV/0001/")
;
QUIT;


	
DATA HAVE;
	SET HAVE;
	NEW_VAR=SCAN(acct_name,1,'/');
RUN;


data want ;
  set have;
  do i=1 to countw(new_var,' ');
     var1=scan(new_var,i,' ');
     output;
  end;
  drop i new_var ;
run;

proc sql;
	create table want1 as
	select *
	from want 
	where acct_id not in  
	(select distinct acct_id 
	from want 
	where strip(compress(substr(acct_id,1,5),,'p')) in (select var1 from want) )

;
quit;


Thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Thank you for providing usable data, even if it seems imperfect.

This should get you started:

data WANT ;
 set HAVE;
  STR=scan(ACCT_NAME,1,'/'); 
  STR=prxchange('s/\b(I|V|MD|JR|SR|M|II)\b//',-1,STR); 
  FN =scan(STR,1);
  N2 =scan(STR,2);
  N3 =scan(STR,3);
  if N3 ne ' ' then do;
    LN=N3;
    MN=N2;
  end;
  else LN=N2;
  STR2=cats(substr(LN,1,5)
           ,substrn('....',1,5-length(LN))
           ,max(length(LN)-5,0)
           ,first(FN)
           ,length(FN)-1
           ,first(MN||'.')
           ); 
  if STR2 ne ACCT_ID;
run;    

 
Obs acct_id acct_name STR2
1 KO...0C4. CAMEY JOE/2282/ JOE..0C4
2 FUSS.0J4. JACOB JOHN FUSS VI/8889/ FUSS.0J4J
3 LO...0T6Z TSARA YEU ZEE LO/6666/ ZEE..0T4Y
4 LOYD.0W5. WESLEY BRUCE LOYD SR/1181/ LOYD.0W5B
5 LUCER7M4. MARIA K LUCERA SR/3333/ LUCER1M4K
6 BUCKE2W0K WK BUCKETT JR/1151/ BUCKE2W1
7 LUM..0R5H ROBERTS H S/1141/ S....0R6H
8 M....0V4. VUING DO/5595/ DO...0V4
9 MILLS0S4C STEVE MILLS IV/9898/ IV...0S4M
10 SUTTE1D3. DARK N SUTTER M D/3331/ SUTTE1D3N
11 ATKIN1R5D ROBERT ATKINS SR/5656/ ATKIN1R5
12 ANGUI9G7. GAMELIEL S ANGUIAND SR/1897/ ANGUI3G7S
13 ARNIT1J5M JESSIE M ARNITT JR/4564/ ARNIT1J5
14 CASTI6A8Z ALEXA Z CASTILLEMOS SR/9861/ CASTI6A4Z
15 BROWN0J5. JACKIE BROWN M D/5671/ D....0J5B
16 CRAPS2D5J CRAPSEN DIANIA JOE/0981/ JOE..0C6D
17 DEBAR6M4H MERCY H DEE/0001/ DEE..0M4H
18 DEDIO1F4R FELEX V DEDIOS JR/0001/ DEDIO1F4
19 KRIEK6S3T SEAN THOMAS KREIKEMAIAR/0001/ KREIK6S3T

View solution in original post

6 REPLIES 6
buddha_d
Pyrite | Level 9

The output should have the below records only.

 

acct_id acct_name


KO...0C4. CAMEY JOE/2282/
LO...0T6Z TSARA YEU ZEE LO/6666/
LOYD.0W5. WESLEY BRUCE LOYD SR/1181/
LUCER7M4. MARIA K LUCERA SR/3333/
LUM..0R5H ROBERTS H S/1141/
M....0V4. VUING DO/5595/
MILLS0S4C STEVE MILLS IV/9898/
SUTTE1D3. DARK N SUTTER M D/3331/
ATKIN1R5D ROBERT ATKINS SR/5656/
ANGUI9G7. GAMELIEL S ANGUIAND SR/1897/
CASTI6A8Z ALEXA Z CASTILLEMOS SR/9861/
DEBAR6M4H MERCY H DEE/0001/
DEDIO1F4R FELEX V DEDIOS JR/0001/

ChrisNZ
Tourmaline | Level 20

Thank you for providing usable data, even if it seems imperfect.

This should get you started:

data WANT ;
 set HAVE;
  STR=scan(ACCT_NAME,1,'/'); 
  STR=prxchange('s/\b(I|V|MD|JR|SR|M|II)\b//',-1,STR); 
  FN =scan(STR,1);
  N2 =scan(STR,2);
  N3 =scan(STR,3);
  if N3 ne ' ' then do;
    LN=N3;
    MN=N2;
  end;
  else LN=N2;
  STR2=cats(substr(LN,1,5)
           ,substrn('....',1,5-length(LN))
           ,max(length(LN)-5,0)
           ,first(FN)
           ,length(FN)-1
           ,first(MN||'.')
           ); 
  if STR2 ne ACCT_ID;
run;    

 
Obs acct_id acct_name STR2
1 KO...0C4. CAMEY JOE/2282/ JOE..0C4
2 FUSS.0J4. JACOB JOHN FUSS VI/8889/ FUSS.0J4J
3 LO...0T6Z TSARA YEU ZEE LO/6666/ ZEE..0T4Y
4 LOYD.0W5. WESLEY BRUCE LOYD SR/1181/ LOYD.0W5B
5 LUCER7M4. MARIA K LUCERA SR/3333/ LUCER1M4K
6 BUCKE2W0K WK BUCKETT JR/1151/ BUCKE2W1
7 LUM..0R5H ROBERTS H S/1141/ S....0R6H
8 M....0V4. VUING DO/5595/ DO...0V4
9 MILLS0S4C STEVE MILLS IV/9898/ IV...0S4M
10 SUTTE1D3. DARK N SUTTER M D/3331/ SUTTE1D3N
11 ATKIN1R5D ROBERT ATKINS SR/5656/ ATKIN1R5
12 ANGUI9G7. GAMELIEL S ANGUIAND SR/1897/ ANGUI3G7S
13 ARNIT1J5M JESSIE M ARNITT JR/4564/ ARNIT1J5
14 CASTI6A8Z ALEXA Z CASTILLEMOS SR/9861/ CASTI6A4Z
15 BROWN0J5. JACKIE BROWN M D/5671/ D....0J5B
16 CRAPS2D5J CRAPSEN DIANIA JOE/0981/ JOE..0C6D
17 DEBAR6M4H MERCY H DEE/0001/ DEE..0M4H
18 DEDIO1F4R FELEX V DEDIOS JR/0001/ DEDIO1F4
19 KRIEK6S3T SEAN THOMAS KREIKEMAIAR/0001/ KREIK6S3T
buddha_d
Pyrite | Level 9
ChrisNZ,
This would definitely a good starting point for my project. Thanks for your help.
buddha_d
Pyrite | Level 9

ChrisNZ, 

            It is working for the records that have middle name, but if the middle name is missing then it is not picking the period in the end. For example:

Below example doesn't have middle name. So the str2 should have DO...0V4. but it shows as DO...0V4.  How can we fix it please?

Thanks,

M....0V4.VUING DO/5595/DO...0V4
ChrisNZ
Tourmaline | Level 20

Replace

,first(MN||'.')

with

,coalecsec(first(MN),'.')

?

buddha_d
Pyrite | Level 9

ChrisNZ,

              Thanks for the idea. It works as per the logic. I appreciate your help. 
Thanks 

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2303 views
  • 0 likes
  • 2 in conversation