FORMATS

Reply
Super Contributor
Posts: 1,041

FORMATS

Hi ,

I want to use the following format on character variable. and create a numeric variable.

how to write 981 and greater to a missing value?????dot below(in red) since these are numbers???

secondly to apply this format on a numeric variable to get another variable????

value $MDC
'001'-'017'=00
'020'-'103'=01

'955'- '965'=24
'969'-'977'=25
'981' -high=.
;
run;

like GROUP=put(DRG,$MDC.)

Thanks

Super Contributor
Posts: 418

Re: FORMATS

Posted in reply to robertrao

Can you give some examples of input data and output data that you want? I am not quite following what you are asking here.

Sounds like you have a varaible that is a character and you want to convert it to a numeric using a format function, and anything that is over '981' you want to be a missing?

If I am correct please let me know!

Super User
Posts: 11,343

Re: FORMATS

Posted in reply to robertrao

First to READ data it is better to use an INValue. Then the code is NewVariable = input(stringvariable,formatname);

PUt creates text and/or conversion warnings.

HIGH is only going to work for numeric values not a string. The Other predicate will turn any not listed values into missing.

If you want leading zeros to display for the numeric value you will need to use a Z format.

Beaware that values of 0165 will be assigned to 0 as format uses the lexical sort order and 0165 as a string is less than 017.

proc format library=work;
invalue MDC
'001'-'017'=00
'020'-'103'=01

'955'- '965'=24
'969'-'977'=25
other=.
;
run;

data test;
input x $;
xnum = input(x,mdc.);
datalines;
001
017
020
103
955
965
969
977
981
1234

0165
;
run;

Super Contributor
Posts: 644

Re: FORMATS

In this case where the numeric ranges are not contiguous I would recommend not using the informat to set all other values to missing.  Instead I would remove the line

     other =  .

from the Proc Format,

and in the datastep place the following before the datalines statement:

     if xnum >= 981 then call missing (xnum) ;

Any number <981 that was not included in the informat ranges would be converted by the default best. informat.

You could then follow the datastep with a Proc Freq to investigate the distribution of the encoded values and identify any incoming values that are out of range.

Richard

Super User
Posts: 10,028

Re: FORMATS

Posted in reply to robertrao

Why not change them into numeric variable, after it and using proc format ?

GROUP=input(DRG, best32.);


alue MDC

969 - 977 =25
981 - high=.

Ask a Question
Discussion stats
  • 4 replies
  • 211 views
  • 0 likes
  • 5 in conversation