BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mahler_ji
Obsidian | Level 7

Hello All!

I hope that everyone had a great weekend!  I have a quick question.

I have a sas dataset that has a bunch of stock tickers in it, (AAPL, BA, etc) and some of them have a "-" or a "." in them.  I want to be able to filter out all of the observations that have these (accidental) special characters in that field.

Essentially, I want to keep observations that have only letters in their ticker symbol.

Any idea how this would work?

Thanks!

John

1 ACCEPTED SOLUTION

Accepted Solutions
Jagadishkatam
Amethyst | Level 16

Hi john,

Thought something like below will help you. The scan function by default recognizes the following delimiters ! $ % & ( ) * + , - . / ;

By using 1 , sac will output the first string before the delimiters.

data have;

input dat$;

new=scan(dat,1);

cards;

BA-a

BAB-a

GOOG.a

;

run;

if there are any other delimiters ??, then you can mention the same in scan. However along with these delimiters , you should also include the default delimiters

data have;

input dat$;

new=scan(dat,1,'?? ! $ % & ( ) * + , - . / ;');

cards;

BA-a

BAB-a

GOOG.a

;

run;

Thanks,

Jag

Thanks,
Jag

View solution in original post

8 REPLIES 8
Reeza
Super User

Look at the notalpha function.

data want;

set have;

if notalpha(stock_ticker)>0 then delete;

run;

Hima
Obsidian | Level 7

Hi

Sorry, its not my intention to put you in the spot. I just want to learn. The code you provided is not working. I am returning empty data set.

data temp3;
input string $ 1-11;
cards;
abcXxX/
_jklxxx
abc.jjj
xXx()lll
xxx*aaa
;
run;

data temp4;
set temp3;

if notalpha(string) > 0 then delete ;
run;

proc print data = temp4;
run;

I love to learn from Masters like you.

Reeza
Super User

Two reasons:

1. String has trailing blanks that need to be trimmed out.

2. You have no items in your data set that are all alpha to be returned.

data temp3;

input string $ ;

cards;

abcXxX/

_jklxxx

abc.jjj

xXx()lll

xxx*aaa

ABC

APPL

IBM

GOOG

;

run;

data temp4;

set temp3;

if notalpha(trim(string)) > 0 then delete ;

run;

proc print data = temp4;

run;

Hima
Obsidian | Level 7


Thanks for clarifying. I read the question incorrectly then. You are correct. Thank you so much for your quick reply.

mahler_ji
Obsidian | Level 7

Hey and

Thank you so much for all of your help, and both answers that were given were amazing.

I have a slightly different question now...

Is there a way that I can trim the tickers so that sas will IGNORE everything after a certain character?  Like if the symbol is BAB-a, I want the observation to return BAB.

The only thing is, the character length is different. There could be ones with only one or two letters and then a character (i.e. BA-a) and then some are more (GOOG.a).  Sometimes the symbol is a dash, sometimes a period and sometimes something else.

Any help would be amazing!

John

Jagadishkatam
Amethyst | Level 16

Hi john,

Thought something like below will help you. The scan function by default recognizes the following delimiters ! $ % & ( ) * + , - . / ;

By using 1 , sac will output the first string before the delimiters.

data have;

input dat$;

new=scan(dat,1);

cards;

BA-a

BAB-a

GOOG.a

;

run;

if there are any other delimiters ??, then you can mention the same in scan. However along with these delimiters , you should also include the default delimiters

data have;

input dat$;

new=scan(dat,1,'?? ! $ % & ( ) * + , - . / ;');

cards;

BA-a

BAB-a

GOOG.a

;

run;

Thanks,

Jag

Thanks,
Jag
PGStats
Opal | Level 21

You will get the most flexibility with regular expressions:

data have;

length dat $20;

input dat;

cards;

TOOLONGWORD

ABC1234

12abcd

BA-a

BAB@a

GOOG.a

.nothing

tôt^weird

;

data want;

/* Regular expression: ^: at beginning of string, [[:alpha:]]{1,6} : 1 to 6 alpha characters, /..../i : match case insensitive */

if not prxId then prxId + prxParse("/^[[:alpha:]]{1,6}/i");

set have;

call prxSubstr(prxId, dat, pos, len);

if pos > 0 then word = substr(dat, pos, len);

drop pos len prxId;

run;

title "Words with 1 to 6 alpha characters appearing at the beginning of strings";

proc print data = want noobs;

run;

PG

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1417 views
  • 2 likes
  • 5 in conversation