BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Elliott
Obsidian | Level 7

I have to scrub a name field for anything considered junk and I am using index for character junk data but I also need to look for any all numeric values, like if someone put a phone number in the name field.  how do I evaluate for all numeric values in a character field?

 

Thanks,

Elliott

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
anynum, anyalpha functions or notalpha, etc type of functions. And there's always regex as well.

View solution in original post

12 REPLIES 12
Reeza
Super User
anynum, anyalpha functions or notalpha, etc type of functions. And there's always regex as well.
Elliott
Obsidian | Level 7

Reeza,

 

I have researched all these options including the compress with d or dk options.... I need to evaluate the entire name field for all numeric values.  Not the first numeric within character data.  an example would be if someone put a phone number in the name field by mistake, I need to flag that with a 1.  Then when I just pull out all the values that were considered junk (flag=1) any all numeric value will be included.

 

are there any other options?

 

Thanks,

Elliott

PaigeMiller
Diamond | Level 26

The suggestion by @ChrisNZ to use COMPRESS is exactly what you want. Although you say

 

I need to evaluate the entire name field for all numeric values.  Not the first numeric within character data.

 

so its not clear why, if you find even one such number in the text string, you need to look for additional numbers, you can set your flag once you find the first number. In which case, ANYNUM will do the job as well.

--
Paige Miller
Elliott
Obsidian | Level 7

Paige,

 

This name field has business names.. I don't want to flag a name like A1 Cleaning as junk, that is why I need to evaluate the entire name field for all numeric values.  Not some character and some numeric.  does that make sense?  I am trying to keep this data to a workable level.  flagging names like that as junk would greatly increase the number of records I pull into the report.

 

Elliott

PaigeMiller
Diamond | Level 26

Yes, it makes sense, but then you need to develop (at least in your mind and then explain it to us) what the rules are that would allow you to determine which records get flagged and which don't. I understand now why A1 Cleaning should not be flagged; here in Buffalo, NY (area code 716) there are a few businesses with names like "716 Cigars", I suppose those shouldn't get flagged either. So what are the rules that determines whether a business name gets flagged or not?

 

Back to SAS, the ANYDIGIT function can be used iteratively within a DO loop in a DATA step so that you can find the locations of all digits in a character string.

--
Paige Miller
Elliott
Obsidian | Level 7

Paige,

 

I have reached out to the requester to explain the issue for all numeric values.  So now she wants to look for any business name where the first position is a numeric.  In this case "716 Cigars" would be pulled in, but I think that may be the best we can do.

 

currently in my code I have a long list of  if index(lowercase(name),"???") > 0 or .....index(lowercase(name),"???") > 0.. then flag=1; logic.

 

what would the compress logic look like?

 

Thanks,

Elliott

PaigeMiller
Diamond | Level 26

@Elliott wrote:

Paige,

 

what would the compress logic look like?

 


With all due respect, I choose not to answer that until the logic for determining which records are flagged is clear. I suspect that ANYDIGIT will work better than COMPRESS, but all answers are being withheld until I know the logic. (By the way, in the past, many times I have given an answer without clear logic being provided by the user, and the next message from the user is a very clear statement that it won't work in this case or some other case ...)

--
Paige Miller
ballardw
Super User

If looking for a starting digit:

data example;
   infile datalines truncover;
   input string $10.;
   y= anydigit(string);
datalines;
abc
abc123
123abc
...abc
___345
;
run;

Y = 1 is starting digit. If you values might have leading spaces then

y= anydigit( strip(string));

 

Elliott
Obsidian | Level 7

ballardw,

 

Thank you

ChrisNZ
Tourmaline | Level 20

>  I don't want to flag a name like A1 Cleaning as junk, that is why I need to evaluate the entire name field for all numeric values.  Not some character and some numeric.  does that make sense? 

No it doesn't

 

> what would the compress logic look like?

As @PaigeMiller repeatedly told you, we still don't have a clear view of your needs. 

Provide examples of input data and the desired outcome, with explanation about how the outcome was reached.

Just because you can't do it with your current code doesn't mean there isn't a way.

Elliott
Obsidian | Level 7

I used anydigit to get what I needed.

 

Thank you Reeza

ChrisNZ
Tourmaline | Level 20

Also look at the compress function, especially with the 'd' and 'dk' modifiers.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1248 views
  • 5 likes
  • 5 in conversation