## DATA Step, Macro, Functions and more

Contributor
Posts: 34

[ Edited ]

How can i remove the data if the variable is not start with a letter. Should i type all the letter or any other short way

variable

A153

B124

0564

.1548

na

w853

I want to keep only these variables. It should be 4 character long

variable

A153

B124

w853

PROC Star
Posts: 1,401

What about 'na'? That starts with a letter two?

Anyway, something like this

``````data have;
input variable \$;
datalines;
A153
B124
0564
.1548
na
w853
;

data want;
set have;
where substr(variable,1,1) not in ('0','1','2','3','4','5','6','7','8','9','.',' ');
run;``````
Contributor
Posts: 34

how can i remove if the character long not more than 4 or less than 4
Super User
Posts: 10,574

[ Edited ]

Use the length() function to determine the current string length.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 9,840

Pretty sure your just pulling our leg now, we ask for test data in the form of a datastep every post.  As such this is just logic snippets:

```data want;
string="A153";
isfirstalpha=ifn(lengthn(compress(char(string,1)," ","a")),0,1);
islength=lengthn(string);
run;```
Super User
Posts: 6,934

A short way:

if length(variable)=4 and anyalpha(variable)=1;

By using ANYALPHA, you don't have to guess what characters might arrive in the next batch of data.

Super User
Posts: 9,840

Just to note its only the first character, so:

if length(variable)=4 and anyalpha(char(variable,1))=1;

Good point about anyalpha() function, saves compress.

Super User
Posts: 6,934

No, the complications aren't needed. I stand by my original post.

ANYALPHA returns the position of the first alpha character.  So by checking that ANYALPHA returns a 1, that's enough.

Super User
Posts: 10,850

[ Edited ]
``````data have;
input variable \$;
datalines;
A153
B124
0564
.1548
na
w853
;
data want;
set have;
if prxmatch('/^[a-z]\S{3}\$/i',strip(variable));
run;``````
Super User
Posts: 8,216

I think@Ksharp meant the following (to ensure that strings were only 4 characters long):

```data have;
input variable \$;
datalines;
A153
B124xxx
d5555
0564
.1548
na
AAAA
w853
;
data want;
set have;
if prxmatch('/^[a-z][a-z0-9]{3}\$/i',strip(variable));
run;
```

Art, CEO, AnalystFinder.com

Super User
Posts: 10,850