I do a large amount of research suvey analysis. Often (too often), we get responses which are invalid. By invalid, I mean they
show up as numbers rather than the formatted value in PROC FREQ.
Here is an example :
proc format;
format fclass 1 = 'Freshman'
2 = 'Sophomore'
3 = 'Junior'
4 = 'Senior';
This format is applied to a numeric variable which should have values between 1 and 4 :
format class fclass.;
What ultimately happens is that respondents code numbers outside the allowable range (as in the proc format statement) :
Running PROC FREQ for this variable, I get :
Freshman 4
Sophomore 11
Junior 23
Senior 13
5 1
6 1
Is there an easy way to automatically set the erroneous unformatted values (i.e. 5 and 6 in this example) to missing numeric values?
What I've been doing is manually programming blocks of code for every question that has one or more invalid values :
if class GT 4 then class=.;
Is there an easier way to do this ? Survey respondents tend to code copious amounts of invalid answers sometimes...
Thanks in advance.
Barry Walton
Barry.Walton@millersville.edu
You can set an Other in the format, i.e.
proc format;
format fclass 1 = 'Freshman'
2 = 'Sophomore'
3 = 'Junior'
4 = 'Senior'
other = 'Other'
run;
Or you can set the label to a missing character, whatever fits your need best.
Two basic approaches depending on how concerned anyone is about out of specified range values.
First is you use an informat that matches your expected values, for your example:
proc format
invalue fclass
1,2,3,4 = _same_
other = .
;
run;
And read the data with that informat. If you have an Excel file, save to CSV and read that to have control.
One advantage of the informat approach is that surveys often have many questions with the same coding schemes and you can use the same format for all of the questions with the same scheme.
This is probably a good time for you to learn about the advanced features of PROC SUMMARY. This example uses the EXCLUSIVE and PRELOADFMT options to achieve the result using only subset implied by the VALUE format. Note also the use of option COMPLETETYPES I'll leave it to use to research what that option does when you remove the "*" from line three of the program.
data class;
input class freq @@;
*if class eq 2 then delete;
cards;
1 4 2 11 3 23 4 13 5 1 6 1
;;;;
run;
proc print;
run;
proc format;
value class 1='Freshman' 2='Sophomore' 3='Junior' 4='Senior';
run;
proc summary data=class nway completetypes;
class class / exclusive preloadfmt;
freq freq;
format class class.;
output out=counts(drop=_type_);
run;
proc print;
run;
proc freq;
tables class / nocum;
weight _freq_;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.