Help using Base SAS procedures

proc format

Accepted Solution Solved
Reply
Contributor
Posts: 30
Accepted Solution

proc format

Hey.

I have a data set with character variables. The values for the variable are numbers, ' ' (empty, as character missings should be), but there are also other kinds of missing values (eg. '...', '.. .' '   .')

I would like to make a format which for these different values take changes the values to : number --> 'numbers' and let all observations which are not numbers stay the same.

That will make it possible for me to find the different kind of missing values.

I hope you understand and have a good idea to how to create this format.

If you have questions, just ask

Best regards, Silas Skovsbo, Aarhus University, institute of economics.


Accepted Solutions
Solution
‎11-26-2012 11:13 AM
PROC Star
Posts: 7,467

Re: proc format

Posted in reply to silasskovsbo

If I correctly understand what you are trying to accomplish, here is another approach:

data test;

  input (x y) ($);

  cards;

1z400 this1

4 45

7 .

8isnt 0900

;

proc sql noprint;

  select catt("_",name,"=input(",name,",?? 32.);",

              "if not missing(_",name,") then do;",

              name,"='numb';end;") into :missings

    separated by " "

      from dictionary.columns

        where libname="WORK" and

            memname="TEST"

  ;

quit;

data want (drop=_Smiley Happy;

  set test;

  &missings.

run;

proc freq data=want;

  tables _all_;

run;

View solution in original post


All Replies
Super User
Posts: 5,424

Re: proc format

Posted in reply to silasskovsbo

I'm not quite following you.

A brief example with example data, and desired output would help.

Data never sleeps
Contributor
Posts: 30

Re: proc format

Posted in reply to silasskovsbo

An example:

For two character variables,  X and Y:

X          Y

1          22

2          ....    

3          . . .

.          32

..         11

. .        78

Then I want a format which gives me:

X               Y

numb          numb

numb          ....    

numb          . . .

.               numb

..              numb

. .             numb

Super Contributor
Posts: 543

Re: proc format

Posted in reply to silasskovsbo

Is this what you need?

data have;

    input x $ ;

    cards;

    a

    4

    5

    6

    .

    ..

      .

    ...

    .

    4

    4

;

proc format;

    value $ fmt 0-99999 = "Number" 

              ;

data want;

    set have;

    format x $fmt.;

run;

proc freq data = want;table x;run;

So in this way, if you do a frequency you get what you want.

(given that I understood what you need)

Let me know if you have any questions.

Good luck!

Contributor
Posts: 30

Re: proc format

Posted in reply to AncaTilea

Thank you!

That solved the problem Smiley Happy

so easy when you know the code!

Super User
Super User
Posts: 7,038

Re: proc format

Posted in reply to silasskovsbo

If your variable is character then the ranges you propose will include many non numbers.

For example 1ABC would be in the range '0' - '9999999' and is not a valid number.

Super Contributor
Posts: 543

Re: proc format

Agreed.

BUT

My understanding was that silasskovsbo wanted to focus on the missing data (basically wanted to find out how many ways the missing data was coded in).

If that's not indeed correct then the PROC FORMAT needs to be modified accordingly.


Cheers!

Super User
Posts: 5,424

Re: proc format

Posted in reply to AncaTilea

I would like to suggest using the NOTDIGIT function which would be easier than creating a format.

Data never sleeps
Contributor
Posts: 30

Re: proc format

Thank you for your suggestions.

As Tom mentions, it is a problem if a value as '1ABC' is in the interval '0'-'9999999999' since I want only pure numbers to be formatted, so I am able to find all the observation which differ and therefor is missing.

How can I then format all values which is a number, but not all other kinds of values, eg. '1ABC', '...', '    .' ?

Nondigit function?

PROC Star
Posts: 1,167

Re: proc format

Posted in reply to silasskovsbo

I'd like to suggest that the logic from https://communities.sas.com/message/145163#145163 makes sense in this context. Let SAS figure out if it can read it as a number, otherwise it remains unchanged. Note the two exceptions; if the string is a single period, SAS will convert it to character missing (all blank), and of course if the string is "NUMBER", it needs to be converted so it won't be lumped in with the actual numbers.

Tom

data have;
length X Y $80;
infile datalines dlm='09'x;
input X Y; /* tab delimited, to retain blanks in the values */
datalines;
1 22
2 ....  
NUMBER 51
3 . . .
. 32
12 NUMBER
.. 11
. . 78
run;

data want;
length X2 Y2 $80;
set have;

if X = "" then
  X = "MISSING";

if X = "NUMBER" then
  X = "'NUMBER'";

if Y = "" then
  Y = "MISSING";

if Y = "NUMBER" then
  Y = "'NUMBER'";

if input(X, ??16.) > . then
  X2 = "NUMBER";
else X2 = X;

if input(Y, ??16.) > . then
  Y2 = "NUMBER";
else Y2 = Y;
run;

proc freq data = want;
table X2 Y2;
run;

Super Contributor
Posts: 543

Re: proc format

Posted in reply to silasskovsbo

ok,all, I think here is a simpler code using the not alpha numeric function.

 

/*NOTALNUM

Purpose: To determine the position of the first character in a string that is not an alphanumeric (any upper- or lowercase letter or a number). If none is found, the function returns a 0. With the use of an optional parameter, this function can begin searching at any position in the string and can also search from right to left, if desired.

*/

data have;
input x $ ;
cards;
1a
1234g
3h
j6
4
5
6
.
..
   .
...
.

4
4
;

data want;

length value $10.;

set have;

temp = NOTALNUM(x);

if temp = 1 then value = x;

  else if temp = 2 then value = "Number";

  else value = "Non-Number";

run;

Solution
‎11-26-2012 11:13 AM
PROC Star
Posts: 7,467

Re: proc format

Posted in reply to silasskovsbo

If I correctly understand what you are trying to accomplish, here is another approach:

data test;

  input (x y) ($);

  cards;

1z400 this1

4 45

7 .

8isnt 0900

;

proc sql noprint;

  select catt("_",name,"=input(",name,",?? 32.);",

              "if not missing(_",name,") then do;",

              name,"='numb';end;") into :missings

    separated by " "

      from dictionary.columns

        where libname="WORK" and

            memname="TEST"

  ;

quit;

data want (drop=_Smiley Happy;

  set test;

  &missings.

run;

proc freq data=want;

  tables _all_;

run;

Contributor
Posts: 30

Re: proc format

Thank you for all the great commentaries. Unfortunate it is still not really doing.

The program needs to automatically do it on all character variables, since there are a huge amount of variables.

I have tried your program, Arthur. But something seems to go wrong?? for the '&missing.' it says statement is not valid??

Frequent Contributor
Posts: 106

Re: proc format

Posted in reply to silasskovsbo

I think Art is on the right track here.

Within a simple datastep a statement like

   if not notdigit(compress(your_char_var,' .')) and

     not missing(input(your_char_var,?? 8.)) then

      your_char_var = compress("'"!!your_char_var!!"'") ;

should do. (Note that the ! exclamation marks are the concatenation operators in this germany based installation. Note also my use of the compress function's second argument to cater for numbers with a decimal point. Scrap that if your numbers of interest are all integers.)

This was my test setup (where I assumed that scientific notation is not used and where it appears in your data purely coincidental) :

data x ;

  input orig_var $char8. @1 num_read ?? 8. ;  /* read both as char and num */

  miss_char = missing(orig_var) ;

  miss_num = missing(input(orig_var,?? 8.)) ;

  not_digit = notdigit(compress(orig_var,' .')) ;    /* used to identify scientific notation */

  if not miss_num and not not_digit then

  orig_var = compress("'"!!orig_var!!"'") ;

  cards ;

a

1a

12

123

1e5

8 8

123.45

.

..

;

run ;

proc print data=x ;

run ;

Contributor
Posts: 30

Re: proc format

Posted in reply to Robert_Bardos

Thank you. This returns a table showing the type of each observation.

But I need a format that does the below: (convert from have --> want)

have

1abc

bc2

...

1234

want

1abc

bc2

...


number

-----------------------------------

By this I will be able to find all the data in a data set with character variables which is not numbers.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 25 replies
  • 593 views
  • 10 likes
  • 8 in conversation