Problems with missing values

Reply
Occasional Contributor
Posts: 8

Problems with missing values

Hello,

I'm analyzing a dataset, which contains many variables with missing values. All the missing values are coded as ".". I do not want to include the missing values in analysis. However, for one of the variables, SAS keeps on including the missing values also in the table as if "." is a separate category, e.g.,

INCOME CATEGORYFREQUENCYPERCENTAGE
.700019.7183
<15,0001250035.2113
15000 TO LESS THAN 250001100030.9859
25000 TO LESS THAN 35000500014.0845
TOTAL35500100

Surprisingly, this happens only for one variable. For the others, the missing values are successfully excluded.

Can anyone give any clue, please?

Thanks,

Deep

Respected Advisor
Posts: 3,156

Re: Problems with missing values

Because it is not missing, instead, it has the value of character '.'   . is missing for number, while for char, ' 'blank is considered as missing.

Occasional Contributor
Posts: 8

Re: Problems with missing values

Thanks, but actually it is . In the question I put the quotation marks just to emphasize that I used . instead of missing values.

Respected Advisor
Posts: 3,156

Re: Problems with missing values

Show me your data, and the way you use to exclude missing values. I did not get the idea from ".", rather, the way you present your data in your first post. You have put "." along with something like "15000 TO LESS THAN 25000", which obviously is char.

Super User
Super User
Posts: 7,050

Re: Problems with missing values

It is because your character variable has a period in it instead of being all blanks.

Either re-code the data or add a where clause to eliminate the observations were INCOME_CATEGORY='.' .

Try this little piece of code to see how CATEGORY and CATEGORY2 are treated differently.

data have ;

length income 8 category $20 freq 8 ;

infile cards dsd dlm='|';

input income category freq ;

category2=category;

if category2=' ' then category2='.';

cards;

.|.|7000

10000|<15,000|12500

20000|15000 TO LESS THAN 25000|11000

30000|25000 TO LESS THAN 35000|5000

run;

proc freq ;

  tables income category category2 ;

  weight freq ;

run;

Ask a Question
Discussion stats
  • 4 replies
  • 175 views
  • 0 likes
  • 3 in conversation