hello,
I have to work on data that are not properly registered in the database. One of the variables is a Name (character, $35.) but in many cases it is completed by error : ' und' or ' oder' are added ton the names (empty space as first part of the character chain).
data Namen ;
format Name $35.;
input Name &;
datalines;
Name1 und
Name2 und
Name3 und
Name4 oder
;
run;
My goal is to clean the Variable Name from all ' und' and ' oder' and get a datastep where only the names are present:
Name1
Name2
Name3
Name4
How should I proceed?
Thanks in advance,
PY
Please try
data Namen ;
input Name &$35.;
position=prxmatch('m/und|oder/i',strip(name));
name2=substr(strip(name),1,position-1);
datalines;
Name1 und
Name2 und
Name3 und
Name4 oder
;
run;
Alternatively with scan
data Namen ;
input Name &$35.;
name2=scan(strip(name),1,' ');
datalines;
Name1 und
Name2 und
Name3 und
Name4 oder
;
run;
hello,
thanks, I tested the second solution first, and come to the problem that some of the NAME observations also contains blank spaces. For instance:
Name1 = 'Von und Zu' and the original NAME content is 'Von und Zu und'
Name2 = 'Tortilla' and the original NAME content is 'Tortilla und'
Name3 = 'Bin dahoam' and the original NAME content is Bin dahoam und'
Name4 = 'Curry Wurst' and the original NAME content is 'Curry Wurst und'
Is there a way to get the list of Name1 to Name4 from the following:
data Namen ;
input Name &$35.;
name2=scan(strip(name),1,' ');
datalines;Von und Zu und
Tortilla und
Bin dahoam und
Curry Wurst und;
run;
Hope this code will help
data Namen ;
input Name &$35.;
name2=prxchange('s/und//i',-1,name);
datalines;
Von und Zu und
Tortilla und
Bin dahoam und
Curry Wurst und
;
Hello,
I think the answer is practicable but I need to test with the real data tomorrow.
Anyway, this code simplifies the further treatment a lot, and then I will only have to focus on the exceptions like ‘Von und Zu’ where the Name in itself contains the ‘ und’.
Nevertheless, I don’t understand what each parameter of the function does and have to research this:
name2=prxchange('s/und//i',-1,name);
what is the meaning of the s, the /, the //, the i and the -1.
Thanks a lot
Probably easier to understand if you first just check if the last word is one you want to remove. And then remove it.
data want ;
set have ;
last_word = scan(name,-1,' ');
if last_word in ('und','oder') then name=substrn(name,1,length(name)-length(last_word));
drop last_word;
run;
Thank you for your answers!
Regards,
PY
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.