BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
biglerc
Obsidian | Level 7

Hello Community!  

 

I have data that gets aggregated from multiple countries and I need a way to create a flag for a character variable that contains characters that are not found on USA keyboards.  Is there a way to do this?    Any guidance is appreciated!

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@biglerc wrote:

Thank you very much for your response!  I ended up using a loop through ascii numbers and it worked.


No need to loop.  You can use COLLATE to generate them.  All of the printable characters, from space to tilde, can be generated with COLLATE(32,126). Then use INDEXC() or VERIFY() function depending whether you want to test for the existence of of valid characters or existence of invalid characters.

data want;
   set have;
   any_valid = 0 ne indexc(string,collate(32,126));
   any_invalid = 0 ne verify(string,collate(32,126));
run;

Example:

data have;
  input string $50.;
  if string='alpha' then string=tranwrd(string,'alpha','CEB1'x);
cards4;
`1234567890-=~!@#$%^&*()_+
qwertyuiop[]\QWERTYUIOP{}|
asdfghjkl;'ASDFGHJKL:"
zxcvbnm,./ZXCVBNM<>?
alpha
;;;;

data want;
   set have;
   any_valid = 0 ne indexc(trim(string),collate(32,126));
   any_invalid = 0 ne verify(trim(string),collate(32,126));
run;

Result

Tom_0-1738018096702.png

 

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

Lots of answers here, depending on what "non-USA keyboard characters" really means to you.

 

https://communities.sas.com/t5/SAS-Programming/Detect-or-Exclude-rows-which-include-not-standard-ASC...

--
Paige Miller
biglerc
Obsidian | Level 7

Thank you very much for your response!  I ended up using a loop through ascii numbers and it worked.

Patrick
Opal | Level 21

@biglerc  If you're only dealing with single byte characters then below will also work and requires less code.

data have;
	var='AbIöIxz'; output;
	var='AbIoIxz'; output;
run;

data want;
	set have;
	nonEnglishChar_flg= prxmatch('/[^a-z]/i',strip(var))>0;
run;

proc print data=want;
run;

 

biglerc
Obsidian | Level 7

Thank you, Patrick!  I am dealing with loads of them, since the data is coming from different countries.  Here's an example:  RESULT = תשובה לדוגמה.  

 

My code currently uses substr to go character by character in every text string and identifies the position of a special character (if it finds one).  This is working perfectly except for the instances like my example above where the entire string is special characters.  In these cases SAS throws an error.  Here is my code - it may be too out of context but maybe you can follow it?  I'm just wondering why it won't do a substr if the whole value is special characters.

 

/* step through each character in the field to look for special character(s) */
do i = 1 to length(current_variable) ;

          if rank(substr(current_variable),i,1)) gt 0 and rank(substr(current_variable),i,1)) lt 32 or rank(substr(current_variable),i,1)) gt 127 then do;
                    position= i;
                    char = byte(rank(substr(varvalue,i,1)));
                    nonascii_pos = catx(',',nonascii_pos,strip(put(position,8.)));
                    nonascii = catx(',',nonascii,strip(char ));
          end;
end;

 

Thanks for taking the time to answer my question!

Tom
Super User Tom
Super User

@biglerc wrote:

Thank you very much for your response!  I ended up using a loop through ascii numbers and it worked.


No need to loop.  You can use COLLATE to generate them.  All of the printable characters, from space to tilde, can be generated with COLLATE(32,126). Then use INDEXC() or VERIFY() function depending whether you want to test for the existence of of valid characters or existence of invalid characters.

data want;
   set have;
   any_valid = 0 ne indexc(string,collate(32,126));
   any_invalid = 0 ne verify(string,collate(32,126));
run;

Example:

data have;
  input string $50.;
  if string='alpha' then string=tranwrd(string,'alpha','CEB1'x);
cards4;
`1234567890-=~!@#$%^&*()_+
qwertyuiop[]\QWERTYUIOP{}|
asdfghjkl;'ASDFGHJKL:"
zxcvbnm,./ZXCVBNM<>?
alpha
;;;;

data want;
   set have;
   any_valid = 0 ne indexc(trim(string),collate(32,126));
   any_invalid = 0 ne verify(trim(string),collate(32,126));
run;

Result

Tom_0-1738018096702.png

 

biglerc
Obsidian | Level 7

This is fantastic! I have never used COLLATE so this is a great learning experience for me.  Thank you sooo much!

biglerc
Obsidian | Level 7

Hi Tom,

Might I ask you for guidance on a slight modification to the code you sent?  Part of my requirement is to keep track of the position of the special character in a string.  So for example, if the variable called "RESULT" contains 5 characters and the 3rd character is a special character, then I need to output 1) the name of the variable, 2) the special character(s) and 3) the position(s) of the special character(s).  This is why I was looping through each character in the string.  My code works correctly if there is at least one NON-special character in the string, however, if all characters are special characters, it crashes.

 

I'm currently experimenting to see if I can combine my looping with the collate you taught me to resolve this 🙂  

 

Thanks!

Tom
Super User Tom
Super User

Looping over the characters in a list should work. 

You might share your issue with the loop failing when there are no special characters in a new topic.  Provide a couple of simple example strings that are causing your problems.

 

Watch out if you are running with system encoding set to UTF-8 or other multibyte character encodings.  It wouldn't matter for generating the simple binary flag variable since the normal 7bit ASCII characters only use one byte.  But for locating other characters you will need to use the Kxxxx series of functions that are designed for working with characters that might use more than one byte.

 

So perhaps something like:

if verify(string,collate(32,126) then do;
  do i=1 to klength(string);
    length char $4 ;
    char = ksubstr(string,i,1);
    if verify(char,collate(32,126)) then put i= char= ;
  end;
end;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1605 views
  • 5 likes
  • 4 in conversation