BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PierreYvesILY
Pyrite | Level 9

Dear SAS experts,

 

I have a set of variables and among them name and surname: NACHNAME and VORNAME. These are character variables length $35.

 

Some of their values are not correct: for instance, you can find :

  • cccc
  • ccc
  • ?0???
  • !-§4$
  • Test

 

Is there a method to clean them (delete the lines, where NACHNAME or VORNAME have this kind of irrelevant value)?

 

Thanx,

regards

PY

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

What are the rules that determine if a value is not acceptable?

--
Paige Miller

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

What are the rules that determine if a value is not acceptable?

--
Paige Miller
PierreYvesILY
Pyrite | Level 9

We are in Germany, so (same rules NACHNAME and VORNAME):

- all names must be written with Latin letters, of any european language (not possible to select this I guess)

- in a name, you can have as many words as you need: 'Du Taxi du Pouet de la Valse Folle' is correct

- the sign - between 2 words is accepted, as well as a space, or both together (people also type errors), and also '

- accents on voyels are OK : ` ´ ^ ~ ö ä ü

- the _ will be tolerated - the following signs determine in all cases a false name: ? , = ; ( ) / & % $ § ! " * + # @ < > > | €

- such test cases names as : cccc, ccc, Test, test... will be eliminated (I noticed some)

 

My purpose here is to be helped in the method to achieve this goal, I would like to progress and understand how I can do.

PaigeMiller
Diamond | Level 26
false_name=findc(name,'?,=;()/&%$§!"*+#@<>>|€')>0;

As far as the other types of false names, you would need to create general rules that could be programmed, such as if the same letter appears 4 times consecutively, that is a false name. Naturally, there are many such general rules that would have to be defined and then programmed. 

--
Paige Miller
PierreYvesILY
Pyrite | Level 9
I implemented this solution with success.
Thank you.
PGStats
Opal | Level 21

Adding a rule rejecting repeated characters:

 

data want;
set have;
invalid = 
	findc(name,'?,=;()/&%$§!"*+#@<>>|€') > 0
		or
	prxmatch("/(\S)\1{2,}/", name);
run;
PG

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 657 views
  • 2 likes
  • 3 in conversation