BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
silasskovsbo
Calcite | Level 5

Hey.

I have a data set with character variables. The values for the variable are numbers, ' ' (empty, as character missings should be), but there are also other kinds of missing values (eg. '...', '.. .' '   .')

I would like to make a format which for these different values take changes the values to : number --> 'numbers' and let all observations which are not numbers stay the same.

That will make it possible for me to find the different kind of missing values.

I hope you understand and have a good idea to how to create this format.

If you have questions, just ask

Best regards, Silas Skovsbo, Aarhus University, institute of economics.

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

If I correctly understand what you are trying to accomplish, here is another approach:

data test;

  input (x y) ($);

  cards;

1z400 this1

4 45

7 .

8isnt 0900

;

proc sql noprint;

  select catt("_",name,"=input(",name,",?? 32.);",

              "if not missing(_",name,") then do;",

              name,"='numb';end;") into :missings

    separated by " "

      from dictionary.columns

        where libname="WORK" and

            memname="TEST"

  ;

quit;

data want (drop=_:);

  set test;

  &missings.

run;

proc freq data=want;

  tables _all_;

run;

View solution in original post

25 REPLIES 25
LinusH
Tourmaline | Level 20

I'm not quite following you.

A brief example with example data, and desired output would help.

Data never sleeps
silasskovsbo
Calcite | Level 5

An example:

For two character variables,  X and Y:

X          Y

1          22

2          ....    

3          . . .

.          32

..         11

. .        78

Then I want a format which gives me:

X               Y

numb          numb

numb          ....    

numb          . . .

.               numb

..              numb

. .             numb

AncaTilea
Pyrite | Level 9

Is this what you need?

data have;

    input x $ ;

    cards;

    a

    4

    5

    6

    .

    ..

      .

    ...

    .

    4

    4

;

proc format;

    value $ fmt 0-99999 = "Number" 

              ;

data want;

    set have;

    format x $fmt.;

run;

proc freq data = want;table x;run;

So in this way, if you do a frequency you get what you want.

(given that I understood what you need)

Let me know if you have any questions.

Good luck!

silasskovsbo
Calcite | Level 5

Thank you!

That solved the problem Smiley Happy

so easy when you know the code!

Tom
Super User Tom
Super User

If your variable is character then the ranges you propose will include many non numbers.

For example 1ABC would be in the range '0' - '9999999' and is not a valid number.

AncaTilea
Pyrite | Level 9

Agreed.

BUT

My understanding was that silasskovsbo wanted to focus on the missing data (basically wanted to find out how many ways the missing data was coded in).

If that's not indeed correct then the PROC FORMAT needs to be modified accordingly.


Cheers!

LinusH
Tourmaline | Level 20

I would like to suggest using the NOTDIGIT function which would be easier than creating a format.

Data never sleeps
silasskovsbo
Calcite | Level 5

Thank you for your suggestions.

As Tom mentions, it is a problem if a value as '1ABC' is in the interval '0'-'9999999999' since I want only pure numbers to be formatted, so I am able to find all the observation which differ and therefor is missing.

How can I then format all values which is a number, but not all other kinds of values, eg. '1ABC', '...', '    .' ?

Nondigit function?

TomKari
Onyx | Level 15

I'd like to suggest that the logic from https://communities.sas.com/message/145163#145163 makes sense in this context. Let SAS figure out if it can read it as a number, otherwise it remains unchanged. Note the two exceptions; if the string is a single period, SAS will convert it to character missing (all blank), and of course if the string is "NUMBER", it needs to be converted so it won't be lumped in with the actual numbers.

Tom

data have;
length X Y $80;
infile datalines dlm='09'x;
input X Y; /* tab delimited, to retain blanks in the values */
datalines;
1 22
2 ....  
NUMBER 51
3 . . .
. 32
12 NUMBER
.. 11
. . 78
run;

data want;
length X2 Y2 $80;
set have;

if X = "" then
  X = "MISSING";

if X = "NUMBER" then
  X = "'NUMBER'";

if Y = "" then
  Y = "MISSING";

if Y = "NUMBER" then
  Y = "'NUMBER'";

if input(X, ??16.) > . then
  X2 = "NUMBER";
else X2 = X;

if input(Y, ??16.) > . then
  Y2 = "NUMBER";
else Y2 = Y;
run;

proc freq data = want;
table X2 Y2;
run;

AncaTilea
Pyrite | Level 9

ok,all, I think here is a simpler code using the not alpha numeric function.

 

/*NOTALNUM

Purpose: To determine the position of the first character in a string that is not an alphanumeric (any upper- or lowercase letter or a number). If none is found, the function returns a 0. With the use of an optional parameter, this function can begin searching at any position in the string and can also search from right to left, if desired.

*/

data have;
input x $ ;
cards;
1a
1234g
3h
j6
4
5
6
.
..
   .
...
.

4
4
;

data want;

length value $10.;

set have;

temp = NOTALNUM(x);

if temp = 1 then value = x;

  else if temp = 2 then value = "Number";

  else value = "Non-Number";

run;

art297
Opal | Level 21

If I correctly understand what you are trying to accomplish, here is another approach:

data test;

  input (x y) ($);

  cards;

1z400 this1

4 45

7 .

8isnt 0900

;

proc sql noprint;

  select catt("_",name,"=input(",name,",?? 32.);",

              "if not missing(_",name,") then do;",

              name,"='numb';end;") into :missings

    separated by " "

      from dictionary.columns

        where libname="WORK" and

            memname="TEST"

  ;

quit;

data want (drop=_:);

  set test;

  &missings.

run;

proc freq data=want;

  tables _all_;

run;

silasskovsbo
Calcite | Level 5

Thank you for all the great commentaries. Unfortunate it is still not really doing.

The program needs to automatically do it on all character variables, since there are a huge amount of variables.

I have tried your program, Arthur. But something seems to go wrong?? for the '&missing.' it says statement is not valid??

Robert_Bardos
Fluorite | Level 6

I think Art is on the right track here.

Within a simple datastep a statement like

   if not notdigit(compress(your_char_var,' .')) and

     not missing(input(your_char_var,?? 8.)) then

      your_char_var = compress("'"!!your_char_var!!"'") ;

should do. (Note that the ! exclamation marks are the concatenation operators in this germany based installation. Note also my use of the compress function's second argument to cater for numbers with a decimal point. Scrap that if your numbers of interest are all integers.)

This was my test setup (where I assumed that scientific notation is not used and where it appears in your data purely coincidental) :

data x ;

  input orig_var $char8. @1 num_read ?? 8. ;  /* read both as char and num */

  miss_char = missing(orig_var) ;

  miss_num = missing(input(orig_var,?? 8.)) ;

  not_digit = notdigit(compress(orig_var,' .')) ;    /* used to identify scientific notation */

  if not miss_num and not not_digit then

  orig_var = compress("'"!!orig_var!!"'") ;

  cards ;

a

1a

12

123

1e5

8 8

123.45

.

..

;

run ;

proc print data=x ;

run ;

silasskovsbo
Calcite | Level 5

Thank you. This returns a table showing the type of each observation.

But I need a format that does the below: (convert from have --> want)

have

1abc

bc2

...

1234

want

1abc

bc2

...


number

-----------------------------------

By this I will be able to find all the data in a data set with character variables which is not numbers.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 25 replies
  • 1749 views
  • 10 likes
  • 8 in conversation