Solved: proc format

silasskovsbo · Posted 11-26-2012 07:59 AM

Hey.

I have a data set with character variables. The values for the variable are numbers, ' ' (empty, as character missings should be), but there are also other kinds of missing values (eg. '...', '.. .' ' .')

I would like to make a format which for these different values take changes the values to : number --> 'numbers' and let all observations which are not numbers stay the same.

That will make it possible for me to find the different kind of missing values.

I hope you understand and have a good idea to how to create this format.

If you have questions, just ask

Best regards, Silas Skovsbo, Aarhus University, institute of economics.

art297 · Posted 11-26-2012 11:13 AM

If I correctly understand what you are trying to accomplish, here is another approach:

data test;

input (x y) ($);

cards;

1z400 this1

4 45

7 .

8isnt 0900

;

proc sql noprint;

select catt("_",name,"=input(",name,",?? 32.);",

"if not missing(_",name,") then do;",

name,"='numb';end;") into :missings

separated by " "

from dictionary.columns

where libname="WORK" and

memname="TEST"

;

quit;

data want (drop=_:);

set test;

&missings.

run;

proc freq data=want;

tables _all_;

run;

View solution in original post

LinusH · Posted 11-26-2012 08:07 AM

I'm not quite following you.

A brief example with example data, and desired output would help.

Data never sleeps

silasskovsbo · Posted 11-26-2012 08:35 AM

An example:

For two character variables, X and Y:

X Y

1 22

2 ....

3 . . .

. 32

.. 11

. . 78

Then I want a format which gives me:

X Y

numb numb

numb ....

numb . . .

. numb

.. numb

. . numb

AncaTilea · Posted 11-26-2012 08:43 AM

Is this what you need?

data have;

input x $ ;

cards;

a

4

5

6

.

..

.

...

.

4

;

proc format;

value $ fmt 0-99999 = "Number"

;

data want;

set have;

format x $fmt.;

run;

proc freq data = want;table x;run;

So in this way, if you do a frequency you get what you want.

(given that I understood what you need)

Let me know if you have any questions.

Good luck!

silasskovsbo · Posted 11-26-2012 08:57 AM

Thank you!

That solved the problem

so easy when you know the code!

Tom · Posted 11-26-2012 10:04 AM

If your variable is character then the ranges you propose will include many non numbers.

For example 1ABC would be in the range '0' - '9999999' and is not a valid number.

AncaTilea · Posted 11-26-2012 10:09 AM

Agreed.

BUT

My understanding was that silasskovsbo wanted to focus on the missing data (basically wanted to find out how many ways the missing data was coded in).

If that's not indeed correct then the PROC FORMAT needs to be modified accordingly.

Cheers!

LinusH · Posted 11-26-2012 10:23 AM

I would like to suggest using the NOTDIGIT function which would be easier than creating a format.

Data never sleeps

silasskovsbo · Posted 11-26-2012 10:38 AM

Thank you for your suggestions.

As Tom mentions, it is a problem if a value as '1ABC' is in the interval '0'-'9999999999' since I want only pure numbers to be formatted, so I am able to find all the observation which differ and therefor is missing.

How can I then format all values which is a number, but not all other kinds of values, eg. '1ABC', '...', ' .' ?

Nondigit function?

TomKari · Posted 11-26-2012 10:42 AM

I'd like to suggest that the logic from https://communities.sas.com/message/145163#145163 makes sense in this context. Let SAS figure out if it can read it as a number, otherwise it remains unchanged. Note the two exceptions; if the string is a single period, SAS will convert it to character missing (all blank), and of course if the string is "NUMBER", it needs to be converted so it won't be lumped in with the actual numbers.

Tom

data have;
length X Y $80;
infile datalines dlm='09'x;
input X Y; /* tab delimited, to retain blanks in the values */
datalines;
1 22
2 ....
NUMBER 51
3 . . .
. 32
12 NUMBER
.. 11
. . 78
run;

data want;
length X2 Y2 $80;
set have;

if X = "" then
X = "MISSING";

if X = "NUMBER" then
X = "'NUMBER'";

if Y = "" then
Y = "MISSING";

if Y = "NUMBER" then
Y = "'NUMBER'";

if input(X, ??16.) > . then
X2 = "NUMBER";
else X2 = X;

if input(Y, ??16.) > . then
Y2 = "NUMBER";
else Y2 = Y;
run;

proc freq data = want;
table X2 Y2;
run;

AncaTilea · Posted 11-26-2012 11:00 AM

ok,all, I think here is a simpler code using the not alpha numeric function.

/*NOTALNUM

Purpose: To determine the position of the first character in a string that is not an alphanumeric (any upper- or lowercase letter or a number). If none is found, the function returns a 0. With the use of an optional parameter, this function can begin searching at any position in the string and can also search from right to left, if desired.

*/

data have;
input x $ ;
cards;
1a
1234g
3h
j6
4
5
6
.
..
.
...
.

4
4
;

data want;

length value $10.;

set have;

temp = NOTALNUM(x);

if temp = 1 then value = x;

else if temp = 2 then value = "Number";

else value = "Non-Number";

run;

art297 · Posted 11-26-2012 11:13 AM

If I correctly understand what you are trying to accomplish, here is another approach:

data test;

input (x y) ($);

cards;

1z400 this1

4 45

7 .

8isnt 0900

;

proc sql noprint;

select catt("_",name,"=input(",name,",?? 32.);",

"if not missing(_",name,") then do;",

name,"='numb';end;") into :missings

separated by " "

from dictionary.columns

where libname="WORK" and

memname="TEST"

;

quit;

data want (drop=_:);

set test;

&missings.

run;

proc freq data=want;

tables _all_;

run;

silasskovsbo · Posted 12-03-2012 05:08 AM

Thank you for all the great commentaries. Unfortunate it is still not really doing.

The program needs to automatically do it on all character variables, since there are a huge amount of variables.

I have tried your program, Arthur. But something seems to go wrong?? for the '&missing.' it says statement is not valid??

Robert_Bardos · Posted 12-03-2012 06:28 AM

I think Art is on the right track here.

Within a simple datastep a statement like

if not notdigit(compress(your_char_var,' .')) and

not missing(input(your_char_var,?? 8.)) then

your_char_var = compress("'"!!your_char_var!!"'") ;

should do. (Note that the ! exclamation marks are the concatenation operators in this germany based installation. Note also my use of the compress function's second argument to cater for numbers with a decimal point. Scrap that if your numbers of interest are all integers.)

This was my test setup (where I assumed that scientific notation is not used and where it appears in your data purely coincidental) :

data x ;

input orig_var $char8. @1 num_read ?? 8. ; /* read both as char and num */

miss_char = missing(orig_var) ;

miss_num = missing(input(orig_var,?? 8.)) ;

not_digit = notdigit(compress(orig_var,' .')) ; /* used to identify scientific notation */

if not miss_num and not not_digit then

orig_var = compress("'"!!orig_var!!"'") ;

cards ;

a

1a

12

123

1e5

8 8

123.45

.

..

;

run ;

proc print data=x ;

run ;

silasskovsbo · Posted 12-03-2012 07:00 AM

Thank you. This returns a table showing the type of each observation.

But I need a format that does the below: (convert from have --> want)

have

1abc

bc2

...

1234

want

1abc

bc2

...

number

-----------------------------------

By this I will be able to find all the data in a data set with character variables which is not numbers.

proc format

Re: proc format

Re: proc format

Re: proc format

Re: proc format

Re: proc format

Re: proc format

Re: proc format

Re: proc format

Re: proc format

Re: proc format

Re: proc format

Re: proc format

Re: proc format

Re: proc format

Re: proc format

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away