Hi there,
I have a character variable that contains values that only include characters, values that only contain numeric, and other values that contain a combination of both numeric and alpha characters. I have included a small list of potential variable values below.
1811
1826
1st airport
1000 islands
1111
: Heathrow
9928
: Seattle
AC2277
I am trying to recode values that only contain numerics as "NA" (i.e obs 1, 2, 5, 7), and I was wondering if anyone had any idea on how this can be done? The dataset I am working with is quite large (observations in the millions), so manually re-coding this variable based on the proc freq outputs can be quite exhaustive.
Any tips you would have to resolve this issue, would be very much appreciated!
There is a NOTDIGIT function that could be used. It returns 0 if a string is all digits.
data want;
input x $ 50.;
length y $50 ;
if notdigit(trim(x)) = 0 then y='NA' ;
else y=x ;
cards;
1811
1826
1st airport
1000 islands
1111
: Heathrow
9928
: Seattle
AC2277
;
run;
Sometimes, the COMPRESS() function can be used to handle situations like this, if you use the third argument. (There are lots of ways you can define what kinds of characters to keep or drop, so COMPRESS() is really quite handy. Especially for people like me who cannot use regex to save my life. Note that there is nothing in the second argument of the "COMPRESS" function because I am not listing out the digits that I'd like to keep.
data have;
input x $ 50.;
cards;
1811
1826
1st airport
1000 islands
1111
: Heathrow
9928
: Seattle
AC2277
;
run;
proc print;
run;
data have2;
set have;
put y $char50.;
x1 = compress(x, , 'kd');
if x1 EQ x then y = 'NA';
else y= x;
run;
There is a NOTDIGIT function that could be used. It returns 0 if a string is all digits.
data want;
input x $ 50.;
length y $50 ;
if notdigit(trim(x)) = 0 then y='NA' ;
else y=x ;
cards;
1811
1826
1st airport
1000 islands
1111
: Heathrow
9928
: Seattle
AC2277
;
run;
Thank you so much!! This solution worked perfectly!
Here are a few possibilities to consider, before deciding on a solution. Would the values below:
1 3 5 /* embedded blanks */
3.1415 /* decimal point */
246 /* leading blanks */
-789 /* negative sign */
+987 /* positive sign */
+ 246 /* combinations */
If these are too easy, you might want to consider scientific notation such as 314159E-5
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.