Hi,
I ran the following program
proc format lib=formtlib; value $answer '0'='No' '1'='Yes' other='Did not answer'; data cody.Taxes; informat SSN $11. Gender $1. Question_1 - Question_4 $1.; input SSN Gender Question_1 - Question_5; datalines; 101-23-1928 M 1 3 C 4 23000 919-67-7800 F 9 2 D 2 17000 202-22-3848 M 0 5 A 5 57000 344-87-8737 M 1 1 B 2 34123 444-38-2837 F . 4 A 1 17233 763-01-0123 F 0 4 A 4 . ; title 'Frequencies for the Taxes Data Set'; proc freq data=cody.taxes; format Question_1 $answer.; tables Question_1; run;
So from the datalines, you can see that the six observations to question 1 is 1, 9, 0, 1, . , 0.
So I have two "1", two "0", one "9" and one "."
the freq print out is
and since I have defined my format as
proc format lib=formtlib; value $answer '0'='No' '1'='Yes' other='Did not answer';
so I don't understand is that why are both the "9" and "." classified as missing value and not as "other" ?
"9" is pretty obvious should be other right? It is not "0" or "1" and it is not missing.
"." is also not missing because in the informat, I specified it to be character value, so it should be not missing since only " " is considered missing for character variable.
When SAS has to combine two (or more) formatted values into a category for the purpose of reporting, it picks the first value alphabetically (if a character variable) or numerically (if a numeric variable) to represent the category. So, in this case, when the values in the category are a missing and a '9', it uses the missing (as this is first alphabetically), and PROC FREQ by default will not consider a missing as a valid value to create a category from. If you use the MISSING option in the TABLES statement of PROC FREQ, this overrides the default behavior of handling missings, and then you get the expected output.
I read this paraphrase multiple times, I still do not understand it. Can someone explain this to me in layman's term?
@Nietzsche I agree with you that the result is not immediately intuitive. However if you give it a bit of thought then when grouping multiple values into a single category and this category includes missings then Proc Freq needs to "decide" if it needs to treat the category as missing or as non-missing to only count the rows belonging to the same category in a single place.
The SAS documentation here is clear about this:
You can use the "missing" keyword to include the category in the analysis OR you need to define a format that puts missings in it's own category.
proc format;
value $answerA
'0'='No'
'1'='Yes'
other='Did not answer'
;
value $answerB
'0'='No'
'1'='Yes'
' ' = 'Missing'
other='Did not answer'
;
run;
data Taxes;
informat SSN $11. Gender $1. Question_1 - Question_4 $1.;
input SSN Gender Question_1 - Question_5;
datalines;
101-23-1928 M 1 3 C 4 23000
919-67-7800 F 9 2 D 2 17000
202-22-3848 M 0 5 A 5 57000
344-87-8737 M 1 1 B 2 34123
444-38-2837 F . 4 A 1 17233
763-01-0123 F 0 4 A 4 .
;
title 'Frequencies for the Taxes Data Set';
proc freq data=taxes;
format Question_1 $answerA.;
tables Question_1 /missing;
run;
proc freq data=taxes;
format Question_1 $answerB.;
tables Question_1;
run;
title;
Modify your PROC FREQ code like this:
proc freq data=cody.taxes;
format Question_1 $answer.;
tables Question_1/out=a;
run;
Next, look at data set A. Then look at data set A where you have removed the format from variable QUESTION_1, you will see that the category assigned to the 9 and missing is listed as missing, despite the fact that some of the values have a non-missing value of 9. By default PROC FREQ does not tabulate frequencies for the missing level.
Re-run the code using the MISSING option, then see what happens.
@PaigeMiller explained the main issue. But just to add a bit, note that this statement:
@Nietzsche wrote:
"." is also not missing because in the informat, I specified it to be character value, so it should be not missing since only " " is considered missing for character variable.
is wrong. If you PROC PRINT the data, you will see that the value of Question_1 for the fifth record is ' ', not '.'
This is because the $1 informat you used to read in the data will interpret as a period as a missing value, as documented: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/leforinforref/n1v0ez0x2x99qdn15797taed37ji.ht...
As mentioned in the end of the documentation, if you want a period to be read into a character variable as a period, you could change to use the $char1 informat.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.