BookmarkSubscribeRSS Feed
PGStats
Opal | Level 21

After a lot of fumbling around, I came to the following guess: when building a table on a formatted variable, Proc Freq confuses all values that format the same as a missing value to missing values. The following example illustrates this:

proc format; value test low-0 = "LOW" OTHER = "HIGH"; run;

data test; output; do x = -3 to 3; output; end; format x test.; run;

proc print; run;

proc freq data=test; table x; run;

Notice the absence of the HIGH category in Freq output. Remove the first OUTPUT statement in the datastep (thereby removing the missing x value from the dataset) and the HIGH category reappears in Freq output.

Had anybody else noticed this?

PG

PG
3 REPLIES 3
ballardw
Super User

Not the same but a similar oddity if you run it through Proc Summary the missing turns into a -3.

 

proc summary data=test nway;

class x;

var x;

output out=testsum n=count;

run;

proc print;

format x f4.;

run;

Proc summary returns the smallest non-missing value as the value of the formatted class variable.

Which is why my custom formats pretty much always have a missing category if I use the Other option.

Astounding
PROC Star

When PROC FREQ counts all the HIGH and LOW values, it only stores one numeric value for each category.  It stores the lowest value that actually appears in the data set.  So here are some results I would expect from your test.

1. If PROC FREQ were to create an output data set, the actual unformatted values for X would be missing and -3.

2. If you were to add the MISSING option when creating the table, HIGH would appear first and LOW would appear second because missing is less than -3.

3. If  you were to remove the first OUTPUT statement, but still include the MISSING option, the order would switch and LOW would appear before HIGH because -3 is less than 1.

In every case, though, the unformatted values in the output data set would clarify what PROC FREQ is doing.

PGStats
Opal | Level 21

Thank you and you help me understand what's going on. There is some logic in storing the lowest value represented in a category. The logic fails when it extends to missing values because the special treatment given almost everywhere to missing values is de facto extended to non-missing values. This can be very confusing; it was for me.

I will try to always remember Ballardw's suggestion for always including an explicit missing category when defining user formats.

I wish I had read this in SAS doc.

PG 

PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1234 views
  • 6 likes
  • 3 in conversation