- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I want to find special characters for the variable name but i am missing few records where the variable name contain space followed by alphabetical or space in between alphabets or alphabets followed by space.
How to capture records using regular expression where spaces are available like pid 105, 106 and 109.
I use the below approach.
data find;
input pid name $;
cards;
101 acbd
102 !and
103 X.Y
104 1TVN
105 A BCD
106 bd
107 ANKR
108 K@234
109 KRS
110 235
;
run;
data find1;
set find;
found= prxchange('s/[A-Za-z.]//i',-1,name);
run;
data find2;
set find1;
where found ne '';
run;
Thanks
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You just need to fiddle with it a bit:
data have;
infile datalines dlm=",";
input pid name $;
datalines;
102,!and
104,1TVN
108,K@234
105,A BCD
;
run;
data want;
set have;
find=compress(name,"","a");
run;
This works for all but the space part. Now this is more of an issue, because depending on the length of name all of the given items contain blanks after the text up to the length of the text string. So say name is $8:
!and
has four "spaces" after the text there. So checking space afterwards doesn't make much sense. If its before or during then possibly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Use compress:
data find2; set find; where lengthn(compress(name,," ","ad"))>0; run;
This will remove all alphanumeric characters from the string, then if the length left is greater than zero you have special characters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks RW09.
But my query is to find all PID where all special characters including space (i.e space before text, in between text and after text) will be there.
My Output would be
pid name found
102 !and !
104 1TVN 1
108 K@234 @234
110 235 235
105 A BCD
106 bd
109 KRS
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You just need to fiddle with it a bit:
data have;
infile datalines dlm=",";
input pid name $;
datalines;
102,!and
104,1TVN
108,K@234
105,A BCD
;
run;
data want;
set have;
find=compress(name,"","a");
run;
This works for all but the space part. Now this is more of an issue, because depending on the length of name all of the given items contain blanks after the text up to the length of the text string. So say name is $8:
!and
has four "spaces" after the text there. So checking space afterwards doesn't make much sense. If its before or during then possibly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for your support.
Is there any possibility to capture the same record using prxchange function. I would like to learn it. I browse in google and found /s to be used. But it is not working.
Any suggestion will be highly appreciated
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In your original dataset, the "leading" and "included" spaces never end up in the variables, as they are considered to be delimiters:
data find;
input pid name $;
cards;
101 acbd
102 !and
103 X.Y
104 1TVN
105 A BCD
106 bd
107 ANKR
108 K@234
109 KRS
110 235
;
run;
data check;
set find;
check = put(name,$hex16.);
run;
proc print data=check noobs;
run;
Result:
pid name check 101 acbd 6163626420202020 102 !and 21616E6420202020 103 X.Y 582E592020202020 104 1TVN 3154564E20202020 105 A 4120202020202020 106 bd 6264202020202020 107 ANKR 414E4B5220202020 108 K@234 4B40323334202020 109 KRS 4B52532020202020 110 235 3233352020202020
To read leading and inserted spaces successfully, use the dsd and dlm= options, and the $CHAR informat:
data find;
infile cards dlm=' ' dsd;
input pid name :$char8.;
cards;
101 "acbd"
102 "!and"
103 "X.Y"
104 "1TVN"
105 "A BCD"
106 "bd "
107 "ANKR"
108 "K@234"
109 " KRS"
110 "235"
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks once again for your guidance.
I would like to learn is there any option in regular expression where I can add to prxchange function to capture the PID containing space
like 105 and 109 or any alternative regexpression.
prxchange('s/[A-Za-z.]//i',-1,name);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks in advance