BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
buddha_d
Pyrite | Level 9

I have a dataset which has account IDs (acct_id) and Account Names (acct_name). I have a pattern to figure out if the account exactly belongs to a customer. There are suffixes like JR or SR or I or V or MD which I don't need to take it to account for account name. I made up the data that almost matches my original data.
For example:
MICHAEL D KUELKAR JR Account Name has account id of KUELK2M6D.
The account ID is calculated like below
first 5 letters of last name (KUELK)+how many more letters are left in the last name (2 more letters, AR in this case)+
first letter of the first name(M as in Michael)+how many letters are left after the first letter of the first name (ICHAEL, 6 Letters)+first letter of middle name (D in this case)

If there are any last name are lower than 5 bytes (length less than 5), there will be periods (like .). Since the length of the last name is less than 5 letter, there will not be any letters left so we have 0 after the last name and periods.
If the middle name is missing then you would have . as well.
Like acct_name (PING-PONG LI/2121/) acct_id is LI...0P3P

I need to find the mismatch acct_name and acct_id.

Here is the code I tried, Please suggest me how to go forward from here. Thanks in advance. 

 

PROC SQL;
create table WORK.HAVE ( bufsize=65536 )
  (
   acct_id char(27),
   acct_name char(240)
  );
QUIT;
PROC SQL;
INSERT INTO have
VALUES ("KING.0R6K","RICHARD K KING JR/1111/")
VALUES ("KO...0C4.","CAMEY JOE/2282/")
VALUES ("KUELK2M6D","MICHAEL D KUELKAR JR/1110/")
VALUES ("FUSS.0J4.","JACOB JOHN FUSS VI/8889/")
VALUES ("LI...0P3P", "PING-PONG LI/2121/")
VALUES ("LO...0T6Z","TSARA YEU ZEE LO/6666/")
VALUES ("LOYD.0W5.","WESLEY BRUCE LOYD SR/1181/")
VALUES ("LUCER7M4.","MARIA K LUCERA SR/3333/")
VALUES ("BUCKE2W0K","WK BUCKETT JR/1151/")
VALUES ("LUM..0R5H","ROBERTS H S/1141/")
VALUES ("M....0V4.","VUING DO/5595/")
VALUES ("MILLS0S4C","STEVE MILLS IV/9898/")
VALUES ("SUTTE1D3.","DARK N SUTTER M D/3331/")
VALUES ("ATKIN1R5D","ROBERT ATKINS SR/5656/")
VALUES ("ATKIN1R5D","ROBERT D ATKINS SR/1212/")
VALUES ("ANGUI9G7.","GAMELIEL S ANGUIAND SR/1897/")
VALUES ("ARNIT1J5M","JESSIE M ARNITT JR/4564/")
VALUES ("DAXTE1F4L","FRANK LEVIS DAXTER V/6781/")
VALUES ("BEASO1R2S","RON SHARE BEASON II/9872/")
VALUES ("CASTI6A8Z","ALEXA Z CASTILLEMOS SR/9861/")
VALUES ("CESTR1G6C","GERARDI CAMACHO-CESTRO IV/5671/")
VALUES ("BROWN0J5.","JACKIE BROWN M D/5671/")
VALUES ("CRAPS2D5J","CRAPSEN DIANIA JOE/0981/")
VALUES ("DEBAR6M4H","MERCY H DEE/0001/")
VALUES ("DEDIO1F4R","FELEX V DEDIOS JR/0001/")
VALUES ("KRIEK6S3T","SEAN THOMAS KREIKEMAIAR/0001/")
VALUES ("CASTR1G6C","GERARDI CAMACHI-CASTRO IV/0001/")
;
QUIT;


	
DATA HAVE;
	SET HAVE;
	NEW_VAR=SCAN(acct_name,1,'/');
RUN;


data want ;
  set have;
  do i=1 to countw(new_var,' ');
     var1=scan(new_var,i,' ');
     output;
  end;
  drop i new_var ;
run;

proc sql;
	create table want1 as
	select *
	from want 
	where acct_id not in  
	(select distinct acct_id 
	from want 
	where strip(compress(substr(acct_id,1,5),,'p')) in (select var1 from want) )

;
quit;


Thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Thank you for providing usable data, even if it seems imperfect.

This should get you started:

data WANT ;
 set HAVE;
  STR=scan(ACCT_NAME,1,'/'); 
  STR=prxchange('s/\b(I|V|MD|JR|SR|M|II)\b//',-1,STR); 
  FN =scan(STR,1);
  N2 =scan(STR,2);
  N3 =scan(STR,3);
  if N3 ne ' ' then do;
    LN=N3;
    MN=N2;
  end;
  else LN=N2;
  STR2=cats(substr(LN,1,5)
           ,substrn('....',1,5-length(LN))
           ,max(length(LN)-5,0)
           ,first(FN)
           ,length(FN)-1
           ,first(MN||'.')
           ); 
  if STR2 ne ACCT_ID;
run;    

 
Obs acct_id acct_name STR2
1 KO...0C4. CAMEY JOE/2282/ JOE..0C4
2 FUSS.0J4. JACOB JOHN FUSS VI/8889/ FUSS.0J4J
3 LO...0T6Z TSARA YEU ZEE LO/6666/ ZEE..0T4Y
4 LOYD.0W5. WESLEY BRUCE LOYD SR/1181/ LOYD.0W5B
5 LUCER7M4. MARIA K LUCERA SR/3333/ LUCER1M4K
6 BUCKE2W0K WK BUCKETT JR/1151/ BUCKE2W1
7 LUM..0R5H ROBERTS H S/1141/ S....0R6H
8 M....0V4. VUING DO/5595/ DO...0V4
9 MILLS0S4C STEVE MILLS IV/9898/ IV...0S4M
10 SUTTE1D3. DARK N SUTTER M D/3331/ SUTTE1D3N
11 ATKIN1R5D ROBERT ATKINS SR/5656/ ATKIN1R5
12 ANGUI9G7. GAMELIEL S ANGUIAND SR/1897/ ANGUI3G7S
13 ARNIT1J5M JESSIE M ARNITT JR/4564/ ARNIT1J5
14 CASTI6A8Z ALEXA Z CASTILLEMOS SR/9861/ CASTI6A4Z
15 BROWN0J5. JACKIE BROWN M D/5671/ D....0J5B
16 CRAPS2D5J CRAPSEN DIANIA JOE/0981/ JOE..0C6D
17 DEBAR6M4H MERCY H DEE/0001/ DEE..0M4H
18 DEDIO1F4R FELEX V DEDIOS JR/0001/ DEDIO1F4
19 KRIEK6S3T SEAN THOMAS KREIKEMAIAR/0001/ KREIK6S3T

View solution in original post

6 REPLIES 6
buddha_d
Pyrite | Level 9

The output should have the below records only.

 

acct_id acct_name


KO...0C4. CAMEY JOE/2282/
LO...0T6Z TSARA YEU ZEE LO/6666/
LOYD.0W5. WESLEY BRUCE LOYD SR/1181/
LUCER7M4. MARIA K LUCERA SR/3333/
LUM..0R5H ROBERTS H S/1141/
M....0V4. VUING DO/5595/
MILLS0S4C STEVE MILLS IV/9898/
SUTTE1D3. DARK N SUTTER M D/3331/
ATKIN1R5D ROBERT ATKINS SR/5656/
ANGUI9G7. GAMELIEL S ANGUIAND SR/1897/
CASTI6A8Z ALEXA Z CASTILLEMOS SR/9861/
DEBAR6M4H MERCY H DEE/0001/
DEDIO1F4R FELEX V DEDIOS JR/0001/

ChrisNZ
Tourmaline | Level 20

Thank you for providing usable data, even if it seems imperfect.

This should get you started:

data WANT ;
 set HAVE;
  STR=scan(ACCT_NAME,1,'/'); 
  STR=prxchange('s/\b(I|V|MD|JR|SR|M|II)\b//',-1,STR); 
  FN =scan(STR,1);
  N2 =scan(STR,2);
  N3 =scan(STR,3);
  if N3 ne ' ' then do;
    LN=N3;
    MN=N2;
  end;
  else LN=N2;
  STR2=cats(substr(LN,1,5)
           ,substrn('....',1,5-length(LN))
           ,max(length(LN)-5,0)
           ,first(FN)
           ,length(FN)-1
           ,first(MN||'.')
           ); 
  if STR2 ne ACCT_ID;
run;    

 
Obs acct_id acct_name STR2
1 KO...0C4. CAMEY JOE/2282/ JOE..0C4
2 FUSS.0J4. JACOB JOHN FUSS VI/8889/ FUSS.0J4J
3 LO...0T6Z TSARA YEU ZEE LO/6666/ ZEE..0T4Y
4 LOYD.0W5. WESLEY BRUCE LOYD SR/1181/ LOYD.0W5B
5 LUCER7M4. MARIA K LUCERA SR/3333/ LUCER1M4K
6 BUCKE2W0K WK BUCKETT JR/1151/ BUCKE2W1
7 LUM..0R5H ROBERTS H S/1141/ S....0R6H
8 M....0V4. VUING DO/5595/ DO...0V4
9 MILLS0S4C STEVE MILLS IV/9898/ IV...0S4M
10 SUTTE1D3. DARK N SUTTER M D/3331/ SUTTE1D3N
11 ATKIN1R5D ROBERT ATKINS SR/5656/ ATKIN1R5
12 ANGUI9G7. GAMELIEL S ANGUIAND SR/1897/ ANGUI3G7S
13 ARNIT1J5M JESSIE M ARNITT JR/4564/ ARNIT1J5
14 CASTI6A8Z ALEXA Z CASTILLEMOS SR/9861/ CASTI6A4Z
15 BROWN0J5. JACKIE BROWN M D/5671/ D....0J5B
16 CRAPS2D5J CRAPSEN DIANIA JOE/0981/ JOE..0C6D
17 DEBAR6M4H MERCY H DEE/0001/ DEE..0M4H
18 DEDIO1F4R FELEX V DEDIOS JR/0001/ DEDIO1F4
19 KRIEK6S3T SEAN THOMAS KREIKEMAIAR/0001/ KREIK6S3T
buddha_d
Pyrite | Level 9
ChrisNZ,
This would definitely a good starting point for my project. Thanks for your help.
buddha_d
Pyrite | Level 9

ChrisNZ, 

            It is working for the records that have middle name, but if the middle name is missing then it is not picking the period in the end. For example:

Below example doesn't have middle name. So the str2 should have DO...0V4. but it shows as DO...0V4.  How can we fix it please?

Thanks,

M....0V4.VUING DO/5595/DO...0V4
ChrisNZ
Tourmaline | Level 20

Replace

,first(MN||'.')

with

,coalecsec(first(MN),'.')

?

buddha_d
Pyrite | Level 9

ChrisNZ,

              Thanks for the idea. It works as per the logic. I appreciate your help. 
Thanks 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1656 views
  • 0 likes
  • 2 in conversation