BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
dcortell
Pyrite | Level 9

Hi Experts

 

I have the following sample dataset

 

data have;
length url $3000.;
input url;
datalines;
blogs.sas.com/wan/2022/03/18/sas-eg-跳出錯誤訊 
www.dog.it
;
run;

I'm trying to find a way to exclude all the row in the dataset which include not ASCII standard characters or not printable characters. Any hints appreciated

1 ACCEPTED SOLUTION

Accepted Solutions
SASJedi
SAS Super FREQ

The COMPRESS function to the rescue! This example keeps only printable characters.

data want;
	set have;
	where compress(url,,'kw')=url;
run;

  

Check out my Jedi SAS Tricks for SAS Users

View solution in original post

6 REPLIES 6
FreelanceReinh
Jade | Level 19

Hi @dcortell,

 

You can use the FINDC function with modifiers corresponding to character classes that you may want to keep or exclude. Or the VERIFY function:

data want;
set have;
if ~verify(url, collate(32,126));
run;

In this example I use the COLLATE function to specify the ASCII characters from blank (decimal ASCII code 32) to tilde (126) as the admissible characters. The subsetting IF statement excludes all observations where URL contains a character outside of this range.

SASJedi
SAS Super FREQ

The COMPRESS function to the rescue! This example keeps only printable characters.

data want;
	set have;
	where compress(url,,'kw')=url;
run;

  

Check out my Jedi SAS Tricks for SAS Users
Tom
Super User Tom
Super User

That is going to include a LOT of non-ASCII characters.

91   data want;
92     url=collate(0,255);
93     expect=collate(32,126);
94     try=compress(url,,'kw');
95     if try ne expect then do;
96       extra=compress(try,expect);
97       put extra= / extra $hex. ;
98     end;
99   run;

extra=€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇ
8082838485868788898A8B8C8E9192939495969798999A9B9C9E9FA0A1A2A3A4A5A6A7A8A9AAABACAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7
Patrick
Opal | Level 21

Your sample data indicate otherwise but should you by any change be dealing with multibyte characters in your real data then none of the already proposed solutions would work and you need to look into SAS string functions on level I18N Level 2. 
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/nlsref/p1pca7vwjjwucin178l8qddjn0gi.htm 

Tom
Super User Tom
Super User

Note that UTF-8 encoding is designed not to mess with normal ASCII codes, so the first (best) solution of using VERIFY() with COLLATE(32,126) will work fine on UTF-8 strings.

Ksharp
Super User
data have;
length url $3000.;
input url;
datalines;
blogs.sas.com/wan/2022/03/18/sas-eg-跳出錯誤訊 
www.dog.it
;
run;

data want;
 set have;
 if prxmatch('/[[:^ascii:]]/',url) then flag=1;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 719 views
  • 7 likes
  • 6 in conversation