Re: How to clean a character variable from some standardised character...

PierreYvesILY · Posted 09-18-2019 06:08 AM

hello,

I have to work on data that are not properly registered in the database. One of the variables is a Name (character, $35.) but in many cases it is completed by error : ' und' or ' oder' are added ton the names (empty space as first part of the character chain).

data Namen ;
format Name $35.;
input Name &;
datalines;
Name1 und
Name2 und
Name3 und
Name4 oder
;
run;

My goal is to clean the Variable Name from all ' und' and ' oder' and get a datastep where only the names are present:

Name1

Name2
Name3
Name4

How should I proceed?

Thanks in advance,

PY

Jagadishkatam · Posted 09-18-2019 06:30 AM

Please try

data Namen ;
input Name &$35.;
position=prxmatch('m/und|oder/i',strip(name));
name2=substr(strip(name),1,position-1);
datalines;
Name1 und
Name2 und
Name3 und
Name4 oder
;
run;

Thanks,
Jag

Jagadishkatam · Posted 09-18-2019 06:31 AM

Alternatively with scan

data Namen ;
input Name &$35.;
name2=scan(strip(name),1,' ');
datalines;
Name1 und
Name2 und
Name3 und
Name4 oder
;
run;

Thanks,
Jag

PierreYvesILY · Posted 09-18-2019 09:36 AM

hello,

thanks, I tested the second solution first, and come to the problem that some of the NAME observations also contains blank spaces. For instance:

Name1 = 'Von und Zu' and the original NAME content is 'Von und Zu und'

Name2 = 'Tortilla' and the original NAME content is 'Tortilla und'

Name3 = 'Bin dahoam' and the original NAME content is Bin dahoam und'

Name4 = 'Curry Wurst' and the original NAME content is 'Curry Wurst und'

Is there a way to get the list of Name1 to Name4 from the following:

data Namen ;
input Name &$35.;
name2=scan(strip(name),1,' ');
datalines;Von und Zu und
Tortilla und
Bin dahoam und
Curry Wurst und;
run;

Jagadishkatam · Posted 09-18-2019 10:01 AM

Hope this code will help

data Namen ;
input Name &$35.;
name2=prxchange('s/und//i',-1,name);
datalines;
Von und Zu und
Tortilla und
Bin dahoam und
Curry Wurst und
;

Thanks,
Jag

PierreYvesILY · Posted 09-18-2019 11:29 AM

Hello,

I think the answer is practicable but I need to test with the real data tomorrow.

Anyway, this code simplifies the further treatment a lot, and then I will only have to focus on the exceptions like ‘Von und Zu’ where the Name in itself contains the ‘ und’.

Nevertheless, I don’t understand what each parameter of the function does and have to research this:

name2=prxchange('s/und//i',-1,name);

what is the meaning of the s, the /, the //, the i and the -1.

Thanks a lot

PierreYvesILY · Posted 09-19-2019 01:34 AM

Hello,

I think the answer is practicable but I need to test with the real data tomorrow.
Anyway, this code simplifies the further treatment a lot, and then I will only have to focus on the exceptions like 'Von und Zu' where the Name in itself contains the ' und'.
Nevertheless, I don't understand what each parameter of the function does and have to research this:

name2=prxchange('s/und//i',-1,name);

what is the meaning of the s, the /, the //, the i and the -1.
Thanks a lot

Jagadishkatam · Posted 09-19-2019 04:46 AM

Sure, please test and let me know.
s - represents substitution
/ / /- these are delimiters so anything inbetween these delimiters is checked and the third / represent anything between the second / and third / will be replaced with.
-1 - it tells the number of times to replace here it is multiple times
if it is just 1 then it will replace only once

hope this helps

Thanks,
Jag

PierreYvesILY · Posted 09-19-2019 04:54 AM

It helps a lot! thank you for the explanations, I really appreciate.
Regards

Tom · Posted 09-18-2019 11:40 AM

Probably easier to understand if you first just check if the last word is one you want to remove. And then remove it.

data want ;
  set have ;
  last_word = scan(name,-1,' ');
  if last_word in ('und','oder') then name=substrn(name,1,length(name)-length(last_word));
  drop last_word;
run;

PierreYvesILY · Posted 09-19-2019 03:27 AM

Thank you for your answers!

Regards,

PY

How to clean a character variable from some standardised character chains?

Re: How to clean a character variable from some standardised character chains?

Re: How to clean a character variable from some standardised character chains?

Re: How to clean a character variable from some standardised character chains?

Re: How to clean a character variable from some standardised character chains?

Re: How to clean a character variable from some standardised character chains?

Re: How to clean a character variable from some standardised character chains?

Re: How to clean a character variable from some standardised character chains?

Re: How to clean a character variable from some standardised character chains?

Re: How to clean a character variable from some standardised character chains?

Re: How to clean a character variable from some standardised character chains?

Registration is open