Help using Base SAS procedures

Filtering character data only when it has specific characteristics

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 101
Accepted Solution

Filtering character data only when it has specific characteristics

Hello All!

I hope that everyone had a great weekend!  I have a quick question.

I have a sas dataset that has a bunch of stock tickers in it, (AAPL, BA, etc) and some of them have a "-" or a "." in them.  I want to be able to filter out all of the observations that have these (accidental) special characters in that field.

Essentially, I want to keep observations that have only letters in their ticker symbol.

Any idea how this would work?

Thanks!

John


Accepted Solutions
Solution
‎09-15-2014 09:54 PM
Trusted Advisor
Posts: 1,137

Re: Filtering character data only when it has specific characteristics

Posted in reply to mahler_ji

Hi john,

Thought something like below will help you. The scan function by default recognizes the following delimiters ! $ % & ( ) * + , - . / ;

By using 1 , sac will output the first string before the delimiters.

data have;

input dat$;

new=scan(dat,1);

cards;

BA-a

BAB-a

GOOG.a

;

run;

if there are any other delimiters ??, then you can mention the same in scan. However along with these delimiters , you should also include the default delimiters

data have;

input dat$;

new=scan(dat,1,'?? ! $ % & ( ) * + , - . / ;');

cards;

BA-a

BAB-a

GOOG.a

;

run;

Thanks,

Jag

Thanks,
Jag

View solution in original post


All Replies
Super User
Posts: 19,815

Re: Filtering character data only when it has specific characteristics

Posted in reply to mahler_ji

Look at the notalpha function.

data want;

set have;

if notalpha(stock_ticker)>0 then delete;

run;

Regular Contributor
Posts: 233

Re: Filtering character data only when it has specific characteristics

Hi

Sorry, its not my intention to put you in the spot. I just want to learn. The code you provided is not working. I am returning empty data set.

data temp3;
input string $ 1-11;
cards;
abcXxX/
_jklxxx
abc.jjj
xXx()lll
xxx*aaa
;
run;

data temp4;
set temp3;

if notalpha(string) > 0 then delete ;
run;

proc print data = temp4;
run;

I love to learn from Masters like you.

Super User
Posts: 19,815

Re: Filtering character data only when it has specific characteristics

Two reasons:

1. String has trailing blanks that need to be trimmed out.

2. You have no items in your data set that are all alpha to be returned.

data temp3;

input string $ ;

cards;

abcXxX/

_jklxxx

abc.jjj

xXx()lll

xxx*aaa

ABC

APPL

IBM

GOOG

;

run;

data temp4;

set temp3;

if notalpha(trim(string)) > 0 then delete ;

run;

proc print data = temp4;

run;

Regular Contributor
Posts: 233

Re: Filtering character data only when it has specific characteristics


Thanks for clarifying. I read the question incorrectly then. You are correct. Thank you so much for your quick reply.

Frequent Contributor
Posts: 101

Re: Filtering character data only when it has specific characteristics

Hey and

Thank you so much for all of your help, and both answers that were given were amazing.

I have a slightly different question now...

Is there a way that I can trim the tickers so that sas will IGNORE everything after a certain character?  Like if the symbol is BAB-a, I want the observation to return BAB.

The only thing is, the character length is different. There could be ones with only one or two letters and then a character (i.e. BA-a) and then some are more (GOOG.a).  Sometimes the symbol is a dash, sometimes a period and sometimes something else.

Any help would be amazing!

John

Super User
Posts: 19,815

Re: Filtering character data only when it has specific characteristics

Posted in reply to mahler_ji

Look at the scan function...

Solution
‎09-15-2014 09:54 PM
Trusted Advisor
Posts: 1,137

Re: Filtering character data only when it has specific characteristics

Posted in reply to mahler_ji

Hi john,

Thought something like below will help you. The scan function by default recognizes the following delimiters ! $ % & ( ) * + , - . / ;

By using 1 , sac will output the first string before the delimiters.

data have;

input dat$;

new=scan(dat,1);

cards;

BA-a

BAB-a

GOOG.a

;

run;

if there are any other delimiters ??, then you can mention the same in scan. However along with these delimiters , you should also include the default delimiters

data have;

input dat$;

new=scan(dat,1,'?? ! $ % & ( ) * + , - . / ;');

cards;

BA-a

BAB-a

GOOG.a

;

run;

Thanks,

Jag

Thanks,
Jag
Respected Advisor
Posts: 4,925

Re: Filtering character data only when it has specific characteristics

Posted in reply to mahler_ji

You will get the most flexibility with regular expressions:

data have;

length dat $20;

input dat;

cards;

TOOLONGWORD

ABC1234

12abcd

BA-a

BAB@a

GOOG.a

.nothing

tôt^weird

;

data want;

/* Regular expression: ^: at beginning of string, [[:alpha:]]{1,6} : 1 to 6 alpha characters, /..../i : match case insensitive */

if not prxId then prxId + prxParse("/^[[:alpha:]]{1,6}/i");

set have;

call prxSubstr(prxId, dat, pos, len);

if pos > 0 then word = substr(dat, pos, len);

drop pos len prxId;

run;

title "Words with 1 to 6 alpha characters appearing at the beginning of strings";

proc print data = want noobs;

run;

PG

PG
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 304 views
  • 2 likes
  • 5 in conversation