BookmarkSubscribeRSS Feed
JoeJ
Calcite | Level 5

Hi I am looking to validate a customer name field. I need to only include values that contain the following information:

Acceptable customer name example formats:  

last name, first name

Doe, Jane

Doe, John

Anthony-Smith, Jim

Requirements:

 - customer name is doesn't contain numbers or special characters. There are cases where the name is hyphenated, which is acceptable.   I was trying this code but not able to get it to work - 

select upcase(strip(compbl(COMPRESS(COMPRESS(cust_name, ""~!@#$%^&*()-_=+\|[]{};:'‘’,.<>?/""),'""""')))) from my_data;

- Need to confirm that the length for the last and first name have a min length of 1 byte each. I am able to determine the length of the field, not sure how to validate the length prior to the comma and preceding the comma. 

select LENGTH(CUST_NAME) AS LEN FROM MY_DATA; 

 

thanks in advance 

15 REPLIES 15
Tom
Super User Tom
Super User

Why COMPRESS?  Are you trying to VALIDATE the value or FIX the value?

If you want to check for invalid characters you could use INDEXC() function.

If you want to check that only VALID characters appear then you could use the VERIFY() function.

 

You should count how many commas.  There should be only 1.

Once you know there is only one comma you can use SCAN() to pull out the first and last name parts so you can test their length.  length(scan(cust_name,1,',')) or length(scan(cust_name,2,','))

 

JoeJ
Calcite | Level 5

The command I posted with compress was just me experimenting. I wasn't sure that was the correct option.  

the cust name field has junk data in it like invalid characters and numbers. I am not familiar with the functions you listed, INDEXC() or VERIFY() . I will research those options. 

 

Correct, there should only be 1 comma. I will try both options to see what result I get.

 

Thank you for the quick response. 

 

Joe

ballardw
Super User

In my not so humble opinion, I would say this indicates to an incorrect reading of the data when brought into SAS. First and last names should be separate variables. These combined name fields are right up there with 1960 data processing standards.

 

I have spent way to much time "fixing" name fields that have stuff combined in one variable to do so if I can possibly avoid it.

 

Note on posting code, really a good idea to post into either a text or code box opened on the forum with the </> or "running man" icons that appear above the message window. The message window here does reformat pasted text which may lead to the appearance of your Compress function call appearing to have unbalanced quotes and parentheses when I copy it to my editor.

 

Are you sure your data does not contain any names like "Van Dyke" or "Le Blanc" where there are spaces in the last name? Will that have any impact on your processing.

 

 

 

JoeJ
Calcite | Level 5

I agree with you 100% that the the name should be broken up into 2 separate fields. Unfortunately I don't have control over how the table was built. 

 

Noted on posting code. It's possible that I messed up the syntax when coding so there could be missing values

 

Yes it is possible for the data to be formatted like "Van **bleep**" or "Le Blanc". Is there a different way to check for that?

 

 

ballardw
Super User

@JoeJ wrote:

 

Yes it is possible for the data to be formatted like "Van **bleep**" or "Le Blanc". Is there a different way to check for that?


That's why I copied your function call to my editor. One of the most likely characters to get "reformatted" is a space character. If you don't compress them away or Strip in the wrong order then the Scan that @Tom suggests should have no problem as it only used the comma to delimit values instead of the default behavior that would include spaces.

 

Unless there is a very good reason not to, i.e. the boss says not to, I would likely use this step as the time to create separate first and last name variables using scan (after defining lengths long enough to hold the longest every expected name value for the variables).

JoeJ
Calcite | Level 5

If you don't mind could you show me an example of what you are referring too? I am not clear as to what you are saying. 

Tom
Super User Tom
Super User

Try something like this:

data have;
  input cust_name $30.;
cards;
Doe,Jane
Doe, John
Anthony-Smith, Jim
James,Hen3ry
Mary Lou Jones
Jim,Bob,Tucker
;

data want;
  set have;
  length fname lname $30 ;
  valid=1;
  if count(cust_name,',') ne 1 then do;
    put 'WARNING: Wrong number of commas. ' cust_name=;
    valid=0;
  end;
  if compress(cust_name,'-,','a') ne ' ' then do;
    put 'WARNING: Invalid character. ' cust_name=;
    valid=0;
  end;
  lname=scan(cust_name,1,',');
  fname=scan(cust_name,2,',');
  if valid then cust_name=catx(', ',lname,fname);
run;

Tom_0-1702059553147.png

 

JoeJ
Calcite | Level 5

Thank you I will try that. 

JoeJ
Calcite | Level 5

@Tom  I have been working with your example. It's working well. Just have a few questions:

 

 - Where does the output from the put statement get written? 

 - I didn't specify this in the beginning but I am running this against 38mm records. Is this the most efficient way to process this data? 

- In this statement  if compress(cust_name,'-,','a') ne ' ' will this identify records that have values other than alpha characters like numeric values? 

Joe

 

Tom
Super User Tom
Super User

PUT statements write to the current output file.  If no FILE statement has been run then that is the SAS log

 

If you expect to have more than a handful of bad records then don't write the warnings to the log.  Perhaps you could change the code to put the note into a character variable that you could review later.  For example by running a reporting step that list (or perhaps just summarizes) the types of bad values found.

JoeJ
Calcite | Level 5

@Tom   Why would these records be flagged based on this code

if count(cust_nm,',') ne 1 then do;
    warning= 'WARNING: Wrong number of commas. ';
    valid=0;
  end;
CUST_NMfnamelnamewarning
MARY FLOWERS MARY FLOWERSWARNING: Wrong number of commas.
TRAVEL FIRST LLC TRAVEL FIRST LLCWARNING: Wrong number of commas.
Tom
Super User Tom
Super User

Huh?

They don't have a comma between the first and last name.

Patrick
Opal | Level 21

On top of what others already wrote: If you really want to clean-up your data then ideally use SAS' data quality functions (if licensed). Here an article for Viya but very similar (or even the same) code should already work in SAS9.

JoeJ
Calcite | Level 5

@Patrick  thank you for sharing. Unfortunately, we don't have Viya. wish I did!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 3828 views
  • 0 likes
  • 5 in conversation