PROC FREQ

Solved
Super Contributor
Posts: 1,041

PROC FREQ

Hi Team,

I THOUGHT PROC FREQ DOES NOT DEFAULTLY INCLUDE MISSING VALUES.

TO MY SURPRISE WHEN I USED TO BELOW STEP I GET THE FOLLOWING OUTPUT....

proc freq data=demographics order=formatted;

tables A/OUT=A_freqs;

tables B/OUT=B_freqs;

tables C/OUT=C_freqs;

tables D/ OUT=D_freqs;   <------------------------SHOWN THE RESULT OF THIS BELOW

tables E/OUT=E_freqs;

tables F/OUT=F_freqs;

tables G/OUT=G_freqs;

tables H/OUT=H_freqs;

tables I/OUT=I_freqs;

tables J/OUT=J_freqs;

tables K/OUT=K_freqs;

tables L/OUT=L_freqs;

tables M/OUT=M_freqs;

run;

D        Count   Percent

.              51          14%    <----------------------------why am i getting this missing values included in percentages of  proc freq???????

0-4          69          19%

>=4         242         66%

SO I DECIDED TO GO WITH THE WHERE STATEMENT IN PROC FREQ

DO I HAVE TO WRITE MULTIPLE WHERE STATEMENTS?

OR JUST ONE WILL BE ENOUGH>>???????

LIKE: where A ne ' ' and B ne " " and C ne . and D ne . so on and forth?????????

Thanks

Accepted Solutions
Solution
‎10-03-2012 01:03 PM
Super User
Posts: 20,727

Re: PROC FREQ

A period is a valid character and isn't considered missing so that makes perfect sense.

Change your format applied to make it a blank instead of a period and see if that works.

All Replies
Super User
Posts: 5,717

Re: PROC FREQ

karun,

PROC FREQ always INCLUDES ALL observations when computing COUNT.  It does not include missing values in the PERCENT calculations unless you specify the MISSING option.

The complete WHERE statement, mentioning every variable, might be removing records that you want to count.  For example, there could be a record where A is missing, but B has a valid value.  The WHERE statement would remove that valid value.  That's not necessarily a bad solution, but you have to understand the result that you are looking at.  If that is the right approach, you might switch to PROC TABULATE instead of PROC FREQ.  PROC TABULATE automatically removes all observations where any of the CLASS variables are missing ... without needing a WHERE statement.

Perhaps another approach makes sense.  If you would like to keep all valid values for B, even when A is missing, you could always change the output data sets.  For example:

tables D / out=D_freqs (where=(D ne .));

Finally, note that you should not use multiple WHERE statements.  You are free to refer to many variables, and multiple conditions, within a single WHERE statement.  But if you use multiple WHERE statements, only the last one has any impact.  Each WHERE statement replaces the earlier one.  There is a condition you can specify to change that ... I forget which of these would be correct but you can check if needed:

where A ne ' ';

where SAME and B ne ' ';

vs.

where _SAME_ and B ne ' ';

Good luck.

Super Contributor
Posts: 1,041

Re: PROC FREQ

Hi Astounding,

Thanks a lot for the detailed explanation.

So if i write the below statement + the missing option

tables D / missing out=D_freqs (where=(D ne .));

D        Count   Percent

.              51          14%    <----------------------------I WILL NOT GET THIS???????and Percentage is divided bwtween non-missing values???

0-4          69          19%

>=4         242         66%

Thanks

Super User
Posts: 20,727

Re: PROC FREQ

You should try it and see what you get.

Is D character or numeric?

Super Contributor
Posts: 1,041

Re: PROC FREQ

Well its Charecter. I used :

tables D / missing out=D_freqs (where=(D ne " "));

BUT I AM GETTING THE SAME OUTPUT AS WHEN I DINT USE THOSE "MISSING" AS WELL AS "WHERE "  OPTION iN THE OUTUT

Thanks

Solution
‎10-03-2012 01:03 PM
Super User
Posts: 20,727

Re: PROC FREQ

A period is a valid character and isn't considered missing so that makes perfect sense.

Change your format applied to make it a blank instead of a period and see if that works.

Super Contributor
Posts: 1,041

Re: PROC FREQ

Hi REEZA,

Its Numeric at first in the parent dataset .And when I wrote the following format.

VALUE DFORMAT

0-4="0-4"

5-high=">=5"

;

The variable in the resulting dataset became Charecter (bcos i used the put function to apply the format..) with 3values

.

0-4

>=5

Super Contributor
Posts: 1,041

Re: PROC FREQ

YES REEZA,

IT WORKS FINE NOW AFTER I CHANGED THE PERID TO A MISSING...

SO I CAN GENERALIZE NOW THAT" WHENEVER I HAVE A CHAR VARIABLE AND IT HAS PERIODS.THEY IT IS BETTER TO DO ANALYSIS AFTER CONVERTING THE PERIOD TO A " "????

IS THAT RIGHT??

THANKS

Super User
Posts: 5,717

Re: PROC FREQ

karun,

Looks like  you are on the right track.  The suggestion you received from Reeza is that you should not have to change the variable later.  Change the format:

value dformat

0-4="0-4"

5-high=">=5"

other=" ";

Good luck.

Super User
Posts: 20,727

Re: PROC FREQ

If a format doesn't find a valid value in the list it puts it in as the underlying variable. I like to use the other option with my formats to check for mistakes so I'll include other="CHECKME" to flag mistakes in my code.

You can account for that in your format as follows by setting other variables that are other than your ranges to blank. What would happen if your d variable was negative for example? Your format would assign that to missing or put it in as "-5", a character.

VALUE DFORMAT

0-4="0-4"

5-high=">=5"

other=""

;

A missing for number is . and a missing for character is a blank. Proc Freq treats missing the same, but the types have to align.

Please don't type in ALL CAPS.

http://theoatmeal.com/pl/minor_differences/capslock

Super Contributor
Posts: 1,041

Re: PROC FREQ

Hi Reeza,

Sorry about my Capslock being "on"

Thanks for the help

Cheers

Super Contributor
Posts: 1,041

Re: PROC FREQ

Hi Astounding,

Also according to  your statement :

"It does not include missing values in the PERCENT calculations unless you specify the MISSING option."

I SHOULD NOT GET 14 % for those missing values.But I am Getting without the MISSING OPTION!!!!

D        Count   Percent

.              51          14%    <----------------------------I WILL NOT GET THIS???????and Percentage is divided bwtween non-missing values???

0-4          69          19%

>=4         242         66%