BookmarkSubscribeRSS Feed
enginemane44
Calcite | Level 5

I do a large amount of research suvey analysis.  Often (too often), we get responses which are invalid.  By invalid, I mean they

show up as numbers rather than the formatted value in PROC FREQ.

 

Here is an example :

proc format;

format fclass  1 = 'Freshman'

                       2 = 'Sophomore'

                       3 = 'Junior'

                       4 = 'Senior';

 

This format is applied to a numeric variable which should have values between 1 and 4 :

 

format class fclass.;

 

What ultimately happens is that respondents code numbers outside the allowable range (as in the proc format statement) :

 

Running PROC FREQ for this variable, I get :

 

Freshman                         4

Sophomore                     11

Junior                              23

Senior                              13

                             5            1

                             6            1

 

Is there an easy way to automatically set the erroneous unformatted values (i.e. 5 and 6 in this example) to missing numeric values?

 

What I've been doing is manually programming blocks of code for every question that has one or more invalid values :

 

if class GT 4 then class=.;

 

Is there an easier way to do this ?  Survey respondents tend to code copious amounts of invalid answers sometimes...

 

Thanks in advance.

 

Barry Walton

 

Barry.Walton@millersville.edu

3 REPLIES 3
JoshB
Quartz | Level 8

You can set an Other in the format, i.e.

 

proc format;
format fclass  1 = 'Freshman'
                       2 = 'Sophomore'
                       3 = 'Junior'
                       4 = 'Senior'
                       other = 'Other'
run;

Or you can set the label to a missing character, whatever fits your need best.

ballardw
Super User

Two basic approaches depending on how concerned anyone is about out of specified range values.

First is you use an informat that matches your expected values, for your example:

 

proc format

invalue fclass

1,2,3,4 = _same_

other = .

;

run;

And read the data with that informat. If you have an Excel file, save to CSV and read that to have control.

One advantage of the informat approach is that surveys often have many questions with the same coding schemes and you can use the same format for all of the questions with the same scheme.

 

data_null__
Jade | Level 19

This is probably a good time for you to learn about the advanced features of PROC SUMMARY.  This example uses the EXCLUSIVE and PRELOADFMT options to achieve the result using only subset implied by the VALUE format.  Note also the use of option COMPLETETYPES I'll leave it to use to research what that option does when you remove the "*" from line three of the program.

data class;
   input class freq @@;
   *if class eq 2 then delete;
   cards;
1  4  2 11  3 23  4 13  5  1  6  1
;;;;
   run;
proc print;
   run;
proc format;
   value class  1='Freshman' 2='Sophomore' 3='Junior' 4='Senior';
   run;
proc summary data=class nway completetypes;
   class class / exclusive preloadfmt;
   freq freq;
   format class class.;
   output out=counts(drop=_type_);
   run;
proc print;
   run;
proc freq;
   tables class / nocum;
   weight _freq_;
   run;

Capture.PNG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1374 views
  • 2 likes
  • 4 in conversation