I have the following sample dataset
data have; length url $3000.; input url; datalines; blogs.sas.com/wan/2022/03/18/sas-eg-è·³å‡ºéŒ¯èª¤è¨Š www.dog.it ; run;
I'm trying to find a way to exclude all the row in the dataset which include not ASCII standard characters or not printable characters. Any hints appreciated
data want; set have; if ~verify(url, collate(32,126)); run;
In this example I use the COLLATE function to specify the ASCII characters from blank (decimal ASCII code 32) to tilde (126) as the admissible characters. The subsetting IF statement excludes all observations where URL contains a character outside of this range.
That is going to include a LOT of non-ASCII characters.
91 data want; 92 url=collate(0,255); 93 expect=collate(32,126); 94 try=compress(url,,'kw'); 95 if try ne expect then do; 96 extra=compress(try,expect); 97 put extra= / extra $hex. ; 98 end; 99 run; extra=€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇ 8082838485868788898A8B8C8E9192939495969798999A9B9C9E9FA0A1A2A3A4A5A6A7A8A9AAABACAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7
Your sample data indicate otherwise but should you by any change be dealing with multibyte characters in your real data then none of the already proposed solutions would work and you need to look into SAS string functions on level I18N Level 2.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.