BookmarkSubscribeRSS Feed
PierreYvesILY
Pyrite | Level 9

hello,

 

I have to work on data that are not properly registered in the database. One of the variables is a Name (character, $35.) but in many cases it is completed by error : ' und' or ' oder' are added ton the names (empty space as first part of the character chain).

 

data Namen ;
format Name $35.;
input Name &;
datalines;
Name1 und
Name2 und
Name3 und
Name4 oder
;
run;

 

My goal is to clean the Variable Name from all ' und' and ' oder' and get a datastep where only the names are present:

 

Name1

Name2
Name3
Name4

 

How should I proceed?

 

Thanks in advance,

PY

10 REPLIES 10
Jagadishkatam
Amethyst | Level 16

Please try

 

data Namen ;
input Name &$35.;
position=prxmatch('m/und|oder/i',strip(name));
name2=substr(strip(name),1,position-1);
datalines;
Name1 und
Name2 und
Name3 und
Name4 oder
;
run;
Thanks,
Jag
Jagadishkatam
Amethyst | Level 16

Alternatively with scan

 

data Namen ;
input Name &$35.;
name2=scan(strip(name),1,' ');
datalines;
Name1 und
Name2 und
Name3 und
Name4 oder
;
run;
Thanks,
Jag
PierreYvesILY
Pyrite | Level 9

hello,

 

thanks, I tested the second solution first, and come to the problem that some of the NAME observations also contains blank spaces. For instance:

Name1 = 'Von und Zu' and the original NAME content is 'Von und Zu und'

Name2 = 'Tortilla'         and the original NAME content is 'Tortilla und'

Name3 = 'Bin dahoam' and the original NAME content is Bin dahoam und'

Name4 = 'Curry Wurst' and the original NAME content is 'Curry Wurst und'

 

Is there a way to get the list of Name1 to Name4 from the following:

 

data Namen ;
input Name &$35.;
name2=scan(strip(name),1,' ');
datalines;Von und Zu und
Tortilla und
Bin dahoam und
Curry Wurst und;
run;

 

Jagadishkatam
Amethyst | Level 16

Hope this code will help

 

data Namen ;
input Name &$35.;
name2=prxchange('s/und//i',-1,name);
datalines;
Von und Zu und
Tortilla und
Bin dahoam und
Curry Wurst und
;
Thanks,
Jag
PierreYvesILY
Pyrite | Level 9

Hello,

 

I think the answer is practicable but I need to test with the real data tomorrow.

Anyway, this code simplifies the further treatment a lot, and then I will only have to focus on the exceptions like ‘Von und Zu’ where the Name in itself contains the ‘ und’.

Nevertheless, I don’t understand what each parameter of the function does and have to research this:

name2=prxchange('s/und//i',-1,name);

 

what is the meaning of the s, the /, the //, the i and the -1.

Thanks a lot

PierreYvesILY
Pyrite | Level 9
Hello,

I think the answer is practicable but I need to test with the real data tomorrow.
Anyway, this code simplifies the further treatment a lot, and then I will only have to focus on the exceptions like 'Von und Zu' where the Name in itself contains the ' und'.
Nevertheless, I don't understand what each parameter of the function does and have to research this:

name2=prxchange('s/und//i',-1,name);

what is the meaning of the s, the /, the //, the i and the -1.
Thanks a lot
Jagadishkatam
Amethyst | Level 16
Sure, please test and let me know.
s - represents substitution
/ / /- these are delimiters so anything inbetween these delimiters is checked and the third / represent anything between the second / and third / will be replaced with.
-1 - it tells the number of times to replace here it is multiple times
if it is just 1 then it will replace only once

hope this helps
Thanks,
Jag
PierreYvesILY
Pyrite | Level 9
It helps a lot! thank you for the explanations, I really appreciate.
Regards
Tom
Super User Tom
Super User

Probably easier to understand if you first just check if the last word is one you want to remove.  And then remove it.

data want ;
  set have ;
  last_word = scan(name,-1,' ');
  if last_word in ('und','oder') then name=substrn(name,1,length(name)-length(last_word));
  drop last_word;
run;
PierreYvesILY
Pyrite | Level 9

Thank you for your answers!

 

Regards,

PY

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 757 views
  • 6 likes
  • 3 in conversation