BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
robertrao
Quartz | Level 8


Hi Team,

I THOUGHT PROC FREQ DOES NOT DEFAULTLY INCLUDE MISSING VALUES.

TO MY SURPRISE WHEN I USED TO BELOW STEP I GET THE FOLLOWING OUTPUT....

proc freq data=demographics order=formatted;

tables A/OUT=A_freqs;

tables B/OUT=B_freqs;

tables C/OUT=C_freqs;

tables D/ OUT=D_freqs;   <------------------------SHOWN THE RESULT OF THIS BELOW

tables E/OUT=E_freqs;

tables F/OUT=F_freqs;

tables G/OUT=G_freqs;

tables H/OUT=H_freqs;

tables I/OUT=I_freqs;

tables J/OUT=J_freqs;

tables K/OUT=K_freqs;

tables L/OUT=L_freqs;

tables M/OUT=M_freqs;

run;

D        Count   Percent

.              51          14%    <----------------------------why am i getting this missing values included in percentages of  proc freq???????

0-4          69          19%

>=4         242         66%        

SO I DECIDED TO GO WITH THE WHERE STATEMENT IN PROC FREQ

DO I HAVE TO WRITE MULTIPLE WHERE STATEMENTS?

OR JUST ONE WILL BE ENOUGH>>???????

LIKE: where A ne ' ' and B ne " " and C ne . and D ne . so on and forth?????????

Please help

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

A period is a valid character and isn't considered missing so that makes perfect sense.

Change your format applied to make it a blank instead of a period and see if that works.

View solution in original post

11 REPLIES 11
Astounding
PROC Star

karun,

PROC FREQ always INCLUDES ALL observations when computing COUNT.  It does not include missing values in the PERCENT calculations unless you specify the MISSING option.

The complete WHERE statement, mentioning every variable, might be removing records that you want to count.  For example, there could be a record where A is missing, but B has a valid value.  The WHERE statement would remove that valid value.  That's not necessarily a bad solution, but you have to understand the result that you are looking at.  If that is the right approach, you might switch to PROC TABULATE instead of PROC FREQ.  PROC TABULATE automatically removes all observations where any of the CLASS variables are missing ... without needing a WHERE statement.

Perhaps another approach makes sense.  If you would like to keep all valid values for B, even when A is missing, you could always change the output data sets.  For example:

tables D / out=D_freqs (where=(D ne .));

Finally, note that you should not use multiple WHERE statements.  You are free to refer to many variables, and multiple conditions, within a single WHERE statement.  But if you use multiple WHERE statements, only the last one has any impact.  Each WHERE statement replaces the earlier one.  There is a condition you can specify to change that ... I forget which of these would be correct but you can check if needed:

where A ne ' ';

where SAME and B ne ' ';

vs.

where _SAME_ and B ne ' ';

Good luck.

robertrao
Quartz | Level 8

Hi Astounding,

Thanks a lot for the detailed explanation.

So if i write the below statement + the missing option

tables D / missing out=D_freqs (where=(D ne .));

D        Count   Percent

.              51          14%    <----------------------------I WILL NOT GET THIS???????and Percentage is divided bwtween non-missing values???

0-4          69          19%

>=4         242         66%

AM I RIGHT?Please correct me??

Thanks

Reeza
Super User

You should try it and see what you get.

Is D character or numeric?

robertrao
Quartz | Level 8

Well its Charecter. I used :

tables D / missing out=D_freqs (where=(D ne " "));

BUT I AM GETTING THE SAME OUTPUT AS WHEN I DINT USE THOSE "MISSING" AS WELL AS "WHERE "  OPTION iN THE OUTUT

Thanks

Reeza
Super User

A period is a valid character and isn't considered missing so that makes perfect sense.

Change your format applied to make it a blank instead of a period and see if that works.

robertrao
Quartz | Level 8

Hi REEZA,

Its Numeric at first in the parent dataset .And when I wrote the following format.

VALUE DFORMAT

0-4="0-4"

5-high=">=5"

;

The variable in the resulting dataset became Charecter (bcos i used the put function to apply the format..) with 3values

.

0-4

>=5

robertrao
Quartz | Level 8

YES REEZA,

IT WORKS FINE NOW AFTER I CHANGED THE PERID TO A MISSING...

SO I CAN GENERALIZE NOW THAT" WHENEVER I HAVE A CHAR VARIABLE AND IT HAS PERIODS.THEY IT IS BETTER TO DO ANALYSIS AFTER CONVERTING THE PERIOD TO A " "????

IS THAT RIGHT??

THANKS

Astounding
PROC Star

karun,

Looks like  you are on the right track.  The suggestion you received from Reeza is that you should not have to change the variable later.  Change the format:

value dformat

   0-4="0-4"

   5-high=">=5"

   other=" ";

Good luck.

Reeza
Super User

If a format doesn't find a valid value in the list it puts it in as the underlying variable. I like to use the other option with my formats to check for mistakes so I'll include other="CHECKME" to flag mistakes in my code.

You can account for that in your format as follows by setting other variables that are other than your ranges to blank. What would happen if your d variable was negative for example? Your format would assign that to missing or put it in as "-5", a character.

VALUE DFORMAT

0-4="0-4"

5-high=">=5"

other=""

;

A missing for number is . and a missing for character is a blank. Proc Freq treats missing the same, but the types have to align.

Please don't type in ALL CAPS.

http://theoatmeal.com/pl/minor_differences/capslock

robertrao
Quartz | Level 8

Hi Reeza,

Sorry about my Capslock being "on"

Thanks for the help

Cheers

robertrao
Quartz | Level 8

Hi Astounding,

Also according to  your statement :

"It does not include missing values in the PERCENT calculations unless you specify the MISSING option."

I SHOULD NOT GET 14 % for those missing values.But I am Getting without the MISSING OPTION!!!!

D        Count   Percent

.              51          14%    <----------------------------I WILL NOT GET THIS???????and Percentage is divided bwtween non-missing values???

0-4          69          19%

>=4         242         66%

Please explain

Thanks

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 2050 views
  • 3 likes
  • 3 in conversation