DATA Step, Macro, Functions and more

How to set unformatted values to missing in data step

Occasional Contributor
Posts: 17

How to set unformatted values to missing in data step

I do a large amount of research suvey analysis.  Often (too often), we get responses which are invalid.  By invalid, I mean they

show up as numbers rather than the formatted value in PROC FREQ.


Here is an example :

proc format;

format fclass  1 = 'Freshman'

                       2 = 'Sophomore'

                       3 = 'Junior'

                       4 = 'Senior';


This format is applied to a numeric variable which should have values between 1 and 4 :


format class fclass.;


What ultimately happens is that respondents code numbers outside the allowable range (as in the proc format statement) :


Running PROC FREQ for this variable, I get :


Freshman                         4

Sophomore                     11

Junior                              23

Senior                              13

                             5            1

                             6            1


Is there an easy way to automatically set the erroneous unformatted values (i.e. 5 and 6 in this example) to missing numeric values?


What I've been doing is manually programming blocks of code for every question that has one or more invalid values :


if class GT 4 then class=.;


Is there an easier way to do this ?  Survey respondents tend to code copious amounts of invalid answers sometimes...


Thanks in advance.


Barry Walton

Posts: 55

Re: How to set unformatted values to missing in data step

You can set an Other in the format, i.e.


proc format;
format fclass  1 = 'Freshman'
                       2 = 'Sophomore'
                       3 = 'Junior'
                       4 = 'Senior'
                       other = 'Other'

Or you can set the label to a missing character, whatever fits your need best.

Super User
Posts: 10,454

Re: How to set unformatted values to missing in data step

Two basic approaches depending on how concerned anyone is about out of specified range values.

First is you use an informat that matches your expected values, for your example:


proc format

invalue fclass

1,2,3,4 = _same_

other = .



And read the data with that informat. If you have an Excel file, save to CSV and read that to have control.

One advantage of the informat approach is that surveys often have many questions with the same coding schemes and you can use the same format for all of the questions with the same scheme.


Respected Advisor
Posts: 3,777

Re: How to set unformatted values to missing in data step

This is probably a good time for you to learn about the advanced features of PROC SUMMARY.  This example uses the EXCLUSIVE and PRELOADFMT options to achieve the result using only subset implied by the VALUE format.  Note also the use of option COMPLETETYPES I'll leave it to use to research what that option does when you remove the "*" from line three of the program.

data class;
   input class freq @@;
   *if class eq 2 then delete;
1  4  2 11  3 23  4 13  5  1  6  1
proc print;
proc format;
   value class  1='Freshman' 2='Sophomore' 3='Junior' 4='Senior';
proc summary data=class nway completetypes;
   class class / exclusive preloadfmt;
   freq freq;
   format class class.;
   output out=counts(drop=_type_);
proc print;
proc freq;
   tables class / nocum;
   weight _freq_;


Ask a Question
Discussion stats
  • 3 replies
  • 4 in conversation