hello,
I have to work on data that are not properly registered in the database. One of the variables is a Name (character, $35.) but in many cases it is completed by error : ' und' or ' oder' are added ton the names (empty space as first part of the character chain).
data Namen ;
format Name $35.;
input Name &;
datalines;
Name1 und
Name2 und
Name3 und
Name4 oder
;
run;
My goal is to clean the Variable Name from all ' und' and ' oder' and get a datastep where only the names are present:
Name1
Name2
Name3
Name4
How should I proceed?
Thanks in advance,
PY
Please try
data Namen ;
input Name &$35.;
position=prxmatch('m/und|oder/i',strip(name));
name2=substr(strip(name),1,position-1);
datalines;
Name1 und
Name2 und
Name3 und
Name4 oder
;
run;
Alternatively with scan
data Namen ;
input Name &$35.;
name2=scan(strip(name),1,' ');
datalines;
Name1 und
Name2 und
Name3 und
Name4 oder
;
run;
hello,
thanks, I tested the second solution first, and come to the problem that some of the NAME observations also contains blank spaces. For instance:
Name1 = 'Von und Zu' and the original NAME content is 'Von und Zu und'
Name2 = 'Tortilla' and the original NAME content is 'Tortilla und'
Name3 = 'Bin dahoam' and the original NAME content is Bin dahoam und'
Name4 = 'Curry Wurst' and the original NAME content is 'Curry Wurst und'
Is there a way to get the list of Name1 to Name4 from the following:
data Namen ;
input Name &$35.;
name2=scan(strip(name),1,' ');
datalines;Von und Zu und
Tortilla und
Bin dahoam und
Curry Wurst und;
run;
Hope this code will help
data Namen ;
input Name &$35.;
name2=prxchange('s/und//i',-1,name);
datalines;
Von und Zu und
Tortilla und
Bin dahoam und
Curry Wurst und
;
Hello,
I think the answer is practicable but I need to test with the real data tomorrow.
Anyway, this code simplifies the further treatment a lot, and then I will only have to focus on the exceptions like ‘Von und Zu’ where the Name in itself contains the ‘ und’.
Nevertheless, I don’t understand what each parameter of the function does and have to research this:
name2=prxchange('s/und//i',-1,name);
what is the meaning of the s, the /, the //, the i and the -1.
Thanks a lot
Probably easier to understand if you first just check if the last word is one you want to remove. And then remove it.
data want ;
set have ;
last_word = scan(name,-1,' ');
if last_word in ('und','oder') then name=substrn(name,1,length(name)-length(last_word));
drop last_word;
run;
Thank you for your answers!
Regards,
PY
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.