BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Tommy1
Quartz | Level 8

I am running a simple Proc Freq using a code (names of variables have been changed for example )

 

proc freq data=frequency noprint;
		tables Num1*Num2/ missing out=test ;
		by Date ;
		Where Indicator ^="Abc";
run;

The log says

NOTE: There were 8796946 observations read from the data set WORK.frequency
WHERE Indicator not = 'Abc';
NOTE: The data set WORK.TEST has 23784 observations and 5 variables.
NOTE: Compressing data set WORK.TEST decreased size by 0.00 percent.
Compressed is 15 pages; un-compressed would require 15 pages.
NOTE: PROCEDURE FREQ used (Total process time):
real time 6.26 seconds
cpu time 6.24 seconds

 

I then copy and pasted the same exact code and only changed two things.

I change the dataset that is output from test to test2.

I also change Indicator^="Abc" to Indicator="Abc"

 

 

proc freq data=frequency noprint;
		tables Num1*Num2/ missing out=test2 ;
		by Date ;
		Where Indicator ="Abc";
run;

The log says

NOTE: No observations were selected from data set WORK.frequency.
NOTE: There were 0 observations read from the data set WORK.frequency.
WHERE Indicator='Abc';
NOTE: The data set WORK.TEST2 has 0 observations and 5 variables.
NOTE: PROCEDURE FREQ used (Total process time):
real time 2.65 seconds
cpu time 2.66 seconds

 

The only difference is that I am looking at all the cases where not equal to "Abc" and then just the cases equal to "Abc".

I thought this may because there we no cases that were ="Abc" in the Indicator column. I then decided to do a proc freq of just that column.

 

proc freq data=frequency;
tables Indicator;
run;

And I get

Indicator          Frequency     

No                   7101114

Abc                   366288

Yes                  1329544

I thought I may might have mispelled Indicator= "Abc" so I went to the Indicator variable and and copy and pasted it into my code. I also copied "Abc" into the code. That still gave me the same problem.

 

Can anyone help me figure out why this problem is happening, it doesn't seem to make sense to me.

 

Thanks for the help,

Tom

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

A few things to test.

The output of PROC FREQ will display the formatted value of the variable.  Try you proc freq again but add a format statement to remove any format attached to the variable.

 

proc freq data=frequency;
tables Indicator;
format indicator;
run;

Check the value of Indicator variable for leading spaces or other invisible characters.  For example you could use $HEX6 format on your PROC FREQ to see the summary based on just the first 3 bytes of the value of INDICATOR.

 

 

proc freq data=frequency;
tables Indicator;
format indicator $hex6.;
run;

If the values are really as they appear then it will look like:

indicator='No '  -> 4E6F20
indicator='Abc'  -> 416263
indicator='Yes'  -> 596573

 

 

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

There is some other inconsistency. First we see in the log:

 

NOTE: There were 8796946 observations read from the data set WORK.frequency
WHERE Indicator not = 'Abc';

 

which does not agree with

 

Indicator          Frequency     

No                   7101114

Abc                   366288

Yes                  1329544

 

Leading me to believe that either you are not working on the same data set each time, or something else is going on that I can't imagine.

--
Paige Miller
Tommy1
Quartz | Level 8

@PaigeMiller Just to double check I added up the observations just to check and the numbers are the same.

 

I thought it could be using a different data set as well, but I am copying and pasting my first Proc Freq to make sure that the data sets are the same. I also just checked to make sure and I am definitely using the same data set.

 

Thanks for the reply.

PaigeMiller
Diamond | Level 26

Adding commas into the numbers for greater clarity

 

NOTE: There were 8,796,946 observations read from the data set WORK.frequency
WHERE Indicator not = 'Abc';

 

So the LOG says almost 8.8 million observations where Indicator not='Abc'

 

Indicator          Frequency     

No                   7,101,114

Abc                   366,288

Yes                  1,329,544

 

Adding the No and Yes, we get about 8.4 million observations where Indicator not='Abc'

 

Is it possible that your text string Abc should have leading blanks, or has some other non-printing character?

 

Can you paste the results of the last PROC FREQ into the {i} window?

--
Paige Miller
Tom
Super User Tom
Super User

A few things to test.

The output of PROC FREQ will display the formatted value of the variable.  Try you proc freq again but add a format statement to remove any format attached to the variable.

 

proc freq data=frequency;
tables Indicator;
format indicator;
run;

Check the value of Indicator variable for leading spaces or other invisible characters.  For example you could use $HEX6 format on your PROC FREQ to see the summary based on just the first 3 bytes of the value of INDICATOR.

 

 

proc freq data=frequency;
tables Indicator;
format indicator $hex6.;
run;

If the values are really as they appear then it will look like:

indicator='No '  -> 4E6F20
indicator='Abc'  -> 416263
indicator='Yes'  -> 596573

 

 

Tommy1
Quartz | Level 8

@Tom Thanks that was the solution. There were a few extra letters when I looked at the format. I never considered that because there was a Format =$3. and Informat=$3. Thanks for all the help.

 

@PaigeMiller  I will make sure to add commas in the future to make it easier to interpret. Thanks for the help as well.

Tom
Super User Tom
Super User
If you attach a format of $3 to a character variable that is longer than 3 bytes then it will be truncated on display. In particular PROC FREQ will use the formatted value when creating the bins.
That is why I recommend not attaching $xx formats to character variables. SAS doesn't need special instructions to display character variables and if you accidently get too short a format attached to the variable it can cause confusion, or worse.
PaigeMiller
Diamond | Level 26

@Tom wrote:

That is why I recommend not attaching $xx formats to character variables.

I'd have to disagree. Formats are very useful with output procedure where you want it to display a certain way, even if the original variable is character. Also, sometimes, you may want several levels of the character variable to be combined into one for purposes of PROC FREQ, for example to lump many categorical levels into an "Other" output.

--
Paige Miller
Tom
Super User Tom
Super User
Those are great use cases for using the $xx format in a format statement in a reporting step. But I still wouldn't attach them to the variable in a permanent dataset.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2605 views
  • 6 likes
  • 3 in conversation