BookmarkSubscribeRSS Feed
Nietzsche
Lapis Lazuli | Level 10

Hi, 

I ran the following program

proc format lib=formtlib;
	value $answer
	'0'='No'
	'1'='Yes'
	other='Did not answer';

data cody.Taxes;
	informat SSN $11. Gender $1. Question_1 - Question_4 $1.;
	input SSN Gender Question_1 - Question_5;

datalines;
101-23-1928 M 1 3 C 4 23000
919-67-7800 F 9 2 D 2 17000
202-22-3848 M 0 5 A 5 57000
344-87-8737 M 1 1 B 2 34123
444-38-2837 F . 4 A 1 17233
763-01-0123 F 0 4 A 4 .
;

title 'Frequencies for the Taxes Data Set';

proc freq data=cody.taxes;
	format Question_1 $answer.;
	tables Question_1;
run;

 

So from the datalines, you can see that the six observations to question 1 is 1, 9, 0, 1, . , 0.

So I have two "1", two "0", one "9" and one "." 

 

the freq print out is

Nietzsche_0-1669543756304.png

and since I have defined my format as

proc format lib=formtlib;
	value $answer
	'0'='No'
	'1'='Yes'
	other='Did not answer';

so I don't understand is that why are both the "9" and "." classified as missing value and not as "other" ?

"9" is pretty obvious should be other right? It is not "0" or "1" and it is not missing.

"." is also not missing because in the informat, I specified it to be character value, so it should be not missing since only " " is considered missing for character variable.

SAS Base Programming (2022 Dec), Preparing for SAS Advanced Programming (Cancelled).
5 REPLIES 5
PaigeMiller
Diamond | Level 26

When SAS has to combine two (or more) formatted values into a category for the purpose of reporting, it picks the first value alphabetically (if a character variable) or numerically (if a numeric variable) to represent the category. So, in this case, when the values in the category are a missing and a '9', it uses the missing (as this is first alphabetically), and PROC FREQ by default will not consider a missing as a valid value to create a category from. If you use the MISSING option in the TABLES statement of PROC FREQ, this overrides the default behavior of handling missings, and then you get the expected output.

--
Paige Miller
Nietzsche
Lapis Lazuli | Level 10

I read this paraphrase multiple times, I still do not understand it. Can someone explain this to me in layman's term?

SAS Base Programming (2022 Dec), Preparing for SAS Advanced Programming (Cancelled).
Patrick
Opal | Level 21

@Nietzsche I agree with you that the result is not immediately intuitive. However if you give it a bit of thought then when grouping multiple values into a single category and this category includes missings then Proc Freq needs to "decide" if it needs to treat the category as missing or as non-missing to only count the rows belonging to the same category in a single place. 

 

The SAS documentation here is clear about this:
Patrick_0-1669637080612.png

You can use the "missing" keyword to include the category in the analysis OR you need to define a format that puts missings in it's own category.

proc format;
	value $answerA
	'0'='No'
	'1'='Yes'
	other='Did not answer'
  ;
	value $answerB
	'0'='No'
	'1'='Yes'
  ' ' = 'Missing'
	other='Did not answer'
  ;
run;

data Taxes;
	informat SSN $11. Gender $1. Question_1 - Question_4 $1.;
	input SSN Gender Question_1 - Question_5;
datalines;
101-23-1928 M 1 3 C 4 23000
919-67-7800 F 9 2 D 2 17000
202-22-3848 M 0 5 A 5 57000
344-87-8737 M 1 1 B 2 34123
444-38-2837 F . 4 A 1 17233
763-01-0123 F 0 4 A 4 .
;

title 'Frequencies for the Taxes Data Set';
proc freq data=taxes;
	format Question_1 $answerA.;
	tables Question_1 /missing;
run;
proc freq data=taxes;
	format Question_1 $answerB.;
	tables Question_1;
run;
title;

Patrick_0-1669637512071.png

 

PaigeMiller
Diamond | Level 26

Modify your PROC FREQ code like this:

 

proc freq data=cody.taxes;
	format Question_1 $answer.;
	tables Question_1/out=a;
run;

 

Next, look at data set A. Then look at data set A where you have removed the format from variable QUESTION_1, you will see that the category assigned to the 9 and missing is listed as missing, despite the fact that some of the values have a non-missing value of 9. By default PROC FREQ does not tabulate frequencies for the missing level.

 

Re-run the code using the MISSING option, then see what happens.

--
Paige Miller
Quentin
Super User

@PaigeMiller explained the main issue.  But just to add a bit, note that this statement:


@Nietzsche wrote:
"." is also not missing because in the informat, I specified it to be character value, so it should be not missing since only " " is considered missing for character variable.

is wrong.  If you PROC PRINT the data, you will see that the value of Question_1 for the fifth record is ' ', not '.'  

 

This is because the $1 informat you used to read in the data will interpret as a period as a missing value, as documented: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/leforinforref/n1v0ez0x2x99qdn15797taed37ji.ht...

 

As mentioned in the end of the documentation, if you want a period to be read into a character variable as a period, you could change to use the $char1 informat.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 782 views
  • 5 likes
  • 4 in conversation