BookmarkSubscribeRSS Feed
Nietzsche
Lapis Lazuli | Level 10

Hi, 

I ran the following program

proc format lib=formtlib;
	value $answer
	'0'='No'
	'1'='Yes'
	other='Did not answer';

data cody.Taxes;
	informat SSN $11. Gender $1. Question_1 - Question_4 $1.;
	input SSN Gender Question_1 - Question_5;

datalines;
101-23-1928 M 1 3 C 4 23000
919-67-7800 F 9 2 D 2 17000
202-22-3848 M 0 5 A 5 57000
344-87-8737 M 1 1 B 2 34123
444-38-2837 F . 4 A 1 17233
763-01-0123 F 0 4 A 4 .
;

title 'Frequencies for the Taxes Data Set';

proc freq data=cody.taxes;
	format Question_1 $answer.;
	tables Question_1;
run;

 

So from the datalines, you can see that the six observations to question 1 is 1, 9, 0, 1, . , 0.

So I have two "1", two "0", one "9" and one "." 

 

the freq print out is

Nietzsche_0-1669543756304.png

and since I have defined my format as

proc format lib=formtlib;
	value $answer
	'0'='No'
	'1'='Yes'
	other='Did not answer';

so I don't understand is that why are both the "9" and "." classified as missing value and not as "other" ?

"9" is pretty obvious should be other right? It is not "0" or "1" and it is not missing.

"." is also not missing because in the informat, I specified it to be character value, so it should be not missing since only " " is considered missing for character variable.

SAS Base Programming (2022 Dec), Preparing for SAS Advanced Programming (Cancelled).
5 REPLIES 5
PaigeMiller
Diamond | Level 26

When SAS has to combine two (or more) formatted values into a category for the purpose of reporting, it picks the first value alphabetically (if a character variable) or numerically (if a numeric variable) to represent the category. So, in this case, when the values in the category are a missing and a '9', it uses the missing (as this is first alphabetically), and PROC FREQ by default will not consider a missing as a valid value to create a category from. If you use the MISSING option in the TABLES statement of PROC FREQ, this overrides the default behavior of handling missings, and then you get the expected output.

--
Paige Miller
Nietzsche
Lapis Lazuli | Level 10

I read this paraphrase multiple times, I still do not understand it. Can someone explain this to me in layman's term?

SAS Base Programming (2022 Dec), Preparing for SAS Advanced Programming (Cancelled).
Patrick
Opal | Level 21

@Nietzsche I agree with you that the result is not immediately intuitive. However if you give it a bit of thought then when grouping multiple values into a single category and this category includes missings then Proc Freq needs to "decide" if it needs to treat the category as missing or as non-missing to only count the rows belonging to the same category in a single place. 

 

The SAS documentation here is clear about this:
Patrick_0-1669637080612.png

You can use the "missing" keyword to include the category in the analysis OR you need to define a format that puts missings in it's own category.

proc format;
	value $answerA
	'0'='No'
	'1'='Yes'
	other='Did not answer'
  ;
	value $answerB
	'0'='No'
	'1'='Yes'
  ' ' = 'Missing'
	other='Did not answer'
  ;
run;

data Taxes;
	informat SSN $11. Gender $1. Question_1 - Question_4 $1.;
	input SSN Gender Question_1 - Question_5;
datalines;
101-23-1928 M 1 3 C 4 23000
919-67-7800 F 9 2 D 2 17000
202-22-3848 M 0 5 A 5 57000
344-87-8737 M 1 1 B 2 34123
444-38-2837 F . 4 A 1 17233
763-01-0123 F 0 4 A 4 .
;

title 'Frequencies for the Taxes Data Set';
proc freq data=taxes;
	format Question_1 $answerA.;
	tables Question_1 /missing;
run;
proc freq data=taxes;
	format Question_1 $answerB.;
	tables Question_1;
run;
title;

Patrick_0-1669637512071.png

 

PaigeMiller
Diamond | Level 26

Modify your PROC FREQ code like this:

 

proc freq data=cody.taxes;
	format Question_1 $answer.;
	tables Question_1/out=a;
run;

 

Next, look at data set A. Then look at data set A where you have removed the format from variable QUESTION_1, you will see that the category assigned to the 9 and missing is listed as missing, despite the fact that some of the values have a non-missing value of 9. By default PROC FREQ does not tabulate frequencies for the missing level.

 

Re-run the code using the MISSING option, then see what happens.

--
Paige Miller
Quentin
Super User

@PaigeMiller explained the main issue.  But just to add a bit, note that this statement:


@Nietzsche wrote:
"." is also not missing because in the informat, I specified it to be character value, so it should be not missing since only " " is considered missing for character variable.

is wrong.  If you PROC PRINT the data, you will see that the value of Question_1 for the fifth record is ' ', not '.'  

 

This is because the $1 informat you used to read in the data will interpret as a period as a missing value, as documented: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/leforinforref/n1v0ez0x2x99qdn15797taed37ji.ht...

 

As mentioned in the end of the documentation, if you want a period to be read into a character variable as a period, you could change to use the $char1 informat.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 498 views
  • 5 likes
  • 4 in conversation