06-27-2013 07:53 PM
I'm analyzing a dataset, which contains many variables with missing values. All the missing values are coded as ".". I do not want to include the missing values in analysis. However, for one of the variables, SAS keeps on including the missing values also in the table as if "." is a separate category, e.g.,
|15000 TO LESS THAN 25000||11000||30.9859|
|25000 TO LESS THAN 35000||5000||14.0845|
Surprisingly, this happens only for one variable. For the others, the missing values are successfully excluded.
Can anyone give any clue, please?
06-27-2013 08:06 PM
Because it is not missing, instead, it has the value of character '.' . is missing for number, while for char, ' 'blank is considered as missing.
06-27-2013 08:45 PM
Show me your data, and the way you use to exclude missing values. I did not get the idea from ".", rather, the way you present your data in your first post. You have put "." along with something like "15000 TO LESS THAN 25000", which obviously is char.
06-27-2013 09:49 PM
It is because your character variable has a period in it instead of being all blanks.
Either re-code the data or add a where clause to eliminate the observations were INCOME_CATEGORY='.' .
Try this little piece of code to see how CATEGORY and CATEGORY2 are treated differently.
data have ;
length income 8 category $20 freq 8 ;
infile cards dsd dlm='|';
input income category freq ;
if category2=' ' then category2='.';
20000|15000 TO LESS THAN 25000|11000
30000|25000 TO LESS THAN 35000|5000
proc freq ;
tables income category category2 ;
weight freq ;