Hi,
I ran the following program
proc format lib=formtlib; value $answer '0'='No' '1'='Yes' other='Did not answer'; data cody.Taxes; informat SSN $11. Gender $1. Question_1 - Question_4 $1.; input SSN Gender Question_1 - Question_5; datalines; 101-23-1928 M 1 3 C 4 23000 919-67-7800 F 9 2 D 2 17000 202-22-3848 M 0 5 A 5 57000 344-87-8737 M 1 1 B 2 34123 444-38-2837 F . 4 A 1 17233 763-01-0123 F 0 4 A 4 . ; title 'Frequencies for the Taxes Data Set'; proc freq data=cody.taxes; format Question_1 $answer.; tables Question_1; run;
So from the datalines, you can see that the six observations to question 1 is 1, 9, 0, 1, . , 0.
So I have two "1", two "0", one "9" and one "."
the freq print out is
and since I have defined my format as
proc format lib=formtlib; value $answer '0'='No' '1'='Yes' other='Did not answer';
so I don't understand is that why are both the "9" and "." classified as missing value and not as "other" ?
"9" is pretty obvious should be other right? It is not "0" or "1" and it is not missing.
"." is also not missing because in the informat, I specified it to be character value, so it should be not missing since only " " is considered missing for character variable.
When SAS has to combine two (or more) formatted values into a category for the purpose of reporting, it picks the first value alphabetically (if a character variable) or numerically (if a numeric variable) to represent the category. So, in this case, when the values in the category are a missing and a '9', it uses the missing (as this is first alphabetically), and PROC FREQ by default will not consider a missing as a valid value to create a category from. If you use the MISSING option in the TABLES statement of PROC FREQ, this overrides the default behavior of handling missings, and then you get the expected output.
I read this paraphrase multiple times, I still do not understand it. Can someone explain this to me in layman's term?
@Nietzsche I agree with you that the result is not immediately intuitive. However if you give it a bit of thought then when grouping multiple values into a single category and this category includes missings then Proc Freq needs to "decide" if it needs to treat the category as missing or as non-missing to only count the rows belonging to the same category in a single place.
The SAS documentation here is clear about this:
You can use the "missing" keyword to include the category in the analysis OR you need to define a format that puts missings in it's own category.
proc format;
value $answerA
'0'='No'
'1'='Yes'
other='Did not answer'
;
value $answerB
'0'='No'
'1'='Yes'
' ' = 'Missing'
other='Did not answer'
;
run;
data Taxes;
informat SSN $11. Gender $1. Question_1 - Question_4 $1.;
input SSN Gender Question_1 - Question_5;
datalines;
101-23-1928 M 1 3 C 4 23000
919-67-7800 F 9 2 D 2 17000
202-22-3848 M 0 5 A 5 57000
344-87-8737 M 1 1 B 2 34123
444-38-2837 F . 4 A 1 17233
763-01-0123 F 0 4 A 4 .
;
title 'Frequencies for the Taxes Data Set';
proc freq data=taxes;
format Question_1 $answerA.;
tables Question_1 /missing;
run;
proc freq data=taxes;
format Question_1 $answerB.;
tables Question_1;
run;
title;
Modify your PROC FREQ code like this:
proc freq data=cody.taxes;
format Question_1 $answer.;
tables Question_1/out=a;
run;
Next, look at data set A. Then look at data set A where you have removed the format from variable QUESTION_1, you will see that the category assigned to the 9 and missing is listed as missing, despite the fact that some of the values have a non-missing value of 9. By default PROC FREQ does not tabulate frequencies for the missing level.
Re-run the code using the MISSING option, then see what happens.
@PaigeMiller explained the main issue. But just to add a bit, note that this statement:
@Nietzsche wrote:
"." is also not missing because in the informat, I specified it to be character value, so it should be not missing since only " " is considered missing for character variable.
is wrong. If you PROC PRINT the data, you will see that the value of Question_1 for the fifth record is ' ', not '.'
This is because the $1 informat you used to read in the data will interpret as a period as a missing value, as documented: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/leforinforref/n1v0ez0x2x99qdn15797taed37ji.ht...
As mentioned in the end of the documentation, if you want a period to be read into a character variable as a period, you could change to use the $char1 informat.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.