- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am running a simple Proc Freq using a code (names of variables have been changed for example )
proc freq data=frequency noprint;
tables Num1*Num2/ missing out=test ;
by Date ;
Where Indicator ^="Abc";
run;
The log says
NOTE: There were 8796946 observations read from the data set WORK.frequency
WHERE Indicator not = 'Abc';
NOTE: The data set WORK.TEST has 23784 observations and 5 variables.
NOTE: Compressing data set WORK.TEST decreased size by 0.00 percent.
Compressed is 15 pages; un-compressed would require 15 pages.
NOTE: PROCEDURE FREQ used (Total process time):
real time 6.26 seconds
cpu time 6.24 seconds
I then copy and pasted the same exact code and only changed two things.
I change the dataset that is output from test to test2.
I also change Indicator^="Abc" to Indicator="Abc"
proc freq data=frequency noprint;
tables Num1*Num2/ missing out=test2 ;
by Date ;
Where Indicator ="Abc";
run;
The log says
NOTE: No observations were selected from data set WORK.frequency.
NOTE: There were 0 observations read from the data set WORK.frequency.
WHERE Indicator='Abc';
NOTE: The data set WORK.TEST2 has 0 observations and 5 variables.
NOTE: PROCEDURE FREQ used (Total process time):
real time 2.65 seconds
cpu time 2.66 seconds
The only difference is that I am looking at all the cases where not equal to "Abc" and then just the cases equal to "Abc".
I thought this may because there we no cases that were ="Abc" in the Indicator column. I then decided to do a proc freq of just that column.
proc freq data=frequency;
tables Indicator;
run;
And I get
Indicator Frequency
No 7101114
Abc 366288
Yes 1329544
I thought I may might have mispelled Indicator= "Abc" so I went to the Indicator variable and and copy and pasted it into my code. I also copied "Abc" into the code. That still gave me the same problem.
Can anyone help me figure out why this problem is happening, it doesn't seem to make sense to me.
Thanks for the help,
Tom
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A few things to test.
The output of PROC FREQ will display the formatted value of the variable. Try you proc freq again but add a format statement to remove any format attached to the variable.
proc freq data=frequency;
tables Indicator;
format indicator;
run;
Check the value of Indicator variable for leading spaces or other invisible characters. For example you could use $HEX6 format on your PROC FREQ to see the summary based on just the first 3 bytes of the value of INDICATOR.
proc freq data=frequency;
tables Indicator;
format indicator $hex6.;
run;
If the values are really as they appear then it will look like:
indicator='No ' -> 4E6F20 indicator='Abc' -> 416263 indicator='Yes' -> 596573
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There is some other inconsistency. First we see in the log:
NOTE: There were 8796946 observations read from the data set WORK.frequency
WHERE Indicator not = 'Abc';
which does not agree with
Indicator Frequency
No 7101114
Abc 366288
Yes 1329544
Leading me to believe that either you are not working on the same data set each time, or something else is going on that I can't imagine.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PaigeMiller Just to double check I added up the observations just to check and the numbers are the same.
I thought it could be using a different data set as well, but I am copying and pasting my first Proc Freq to make sure that the data sets are the same. I also just checked to make sure and I am definitely using the same data set.
Thanks for the reply.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Adding commas into the numbers for greater clarity
NOTE: There were 8,796,946 observations read from the data set WORK.frequency
WHERE Indicator not = 'Abc';
So the LOG says almost 8.8 million observations where Indicator not='Abc'
Indicator Frequency
No 7,101,114
Abc 366,288
Yes 1,329,544
Adding the No and Yes, we get about 8.4 million observations where Indicator not='Abc'
Is it possible that your text string Abc should have leading blanks, or has some other non-printing character?
Can you paste the results of the last PROC FREQ into the {i} window?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A few things to test.
The output of PROC FREQ will display the formatted value of the variable. Try you proc freq again but add a format statement to remove any format attached to the variable.
proc freq data=frequency;
tables Indicator;
format indicator;
run;
Check the value of Indicator variable for leading spaces or other invisible characters. For example you could use $HEX6 format on your PROC FREQ to see the summary based on just the first 3 bytes of the value of INDICATOR.
proc freq data=frequency;
tables Indicator;
format indicator $hex6.;
run;
If the values are really as they appear then it will look like:
indicator='No ' -> 4E6F20 indicator='Abc' -> 416263 indicator='Yes' -> 596573
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Tom Thanks that was the solution. There were a few extra letters when I looked at the format. I never considered that because there was a Format =$3. and Informat=$3. Thanks for all the help.
@PaigeMiller I will make sure to add commas in the future to make it easier to interpret. Thanks for the help as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That is why I recommend not attaching $xx formats to character variables. SAS doesn't need special instructions to display character variables and if you accidently get too short a format attached to the variable it can cause confusion, or worse.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Tom wrote:
That is why I recommend not attaching $xx formats to character variables.
I'd have to disagree. Formats are very useful with output procedure where you want it to display a certain way, even if the original variable is character. Also, sometimes, you may want several levels of the character variable to be combined into one for purposes of PROC FREQ, for example to lump many categorical levels into an "Other" output.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content