BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rebecca8
Fluorite | Level 6

Hi there,

 

I am running a proc freq for a numerical variable - and SAS is generating separate rows for the same numbers. For instance, it goes in chronological order w/ the frequencies from 1 to 156, then after that it starts going through lower numbers again. So there may be fifteen 1s, but SAS is separating them into one group of 8 and one group of 7 in the proc freq - essentially reporting the response of '1' as separate responses when it is the same. 

 

I've checked that in the input data the font/size/spacing of these variables is all the same. Is there a way I can program for it to stop doing that? 

 

Thanks! 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

As @Tom noted, there are other characters that won't be visible, but can still be part of your variable.  Leading blanks are one possibility, but other characters can cause a similar problem.  Luckily, it's easy to get rid of all non-digits:

 

var = compress(var, , 'kd' );

 

KD = Keep Digits, and any other characters will be removed.  Apply that statement to your data, and see if there is still a problem with the PROC FREQ.

View solution in original post

7 REPLIES 7
ed_sas_member
Meteorite | Level 14

Hi @rebecca8 

 

Is the type of the variable character or numeric? Have they a format?

You can run a PROC CONTENTS to see this.

 

Proc contents data=have;
run;

 

 

If it is character, could you please run the following program to see if there are some trailing blanks:

 

data have2;
	set have;
	if prxmatch('/\s/',my_var) > 0 then flag=1;
run;

(replace 'my_var' by you variable name and 'have' by your dataset name)

rebecca8
Fluorite | Level 6

Hi there,

 

I ran this, and there does appear to be some trailing/leading spaces. But when I try either the trim or the strip function, it doesn't seem to make a difference and the table is still generating identical variables out separately. 

 

Any suggestions moving forward? 

Tom
Super User Tom
Super User

To remove leading spaces use LEFT() function. 

var=left(var);

But that assumes the leading characters are actually spaces and not some other non-printing character.

 

What are the leading characters in your data?

Try running your proc freq but format the variable using $HEX2. format so that you will just see the hex code for the first characters. 

 

Normal characters are in general the values between space '20'x and tilda '7E'x.

'30'x to '39'x are the digits. 

'41'x  is uppercase A. 

 

Here are some common invisible characters:

'20'x is a space. 

'00'x is a binary zero. 

'09'x is a tab. 

'0A'x is a linefeed.

'0D'x is a carriage return.

'A0'x is the goofy character Microsoft invented to use as a non-breaking space for typesetting.

 

To remove those characters you can use compress

var=compress(var,'090A0D00A0'x);

Or convert them to spaces using translate:

var=left(translate(var,'     ','00090A0DA0'x));
ed_sas_member
Meteorite | Level 14

Hi @rebecca8 

 

Maybe you could try to do this:

data have2;
	set have;
	num_2=compress(num);

proc freq data=have2;
	tables num_2;
run;
Tom
Super User Tom
Super User

You will need to show examples of the values that are at issue to get a complete answer. But if PROC FREQ considers the values as different then they are different. You cannot fix it without changing the values that PROC FREQ is grouping by. 

 

Is your variable numeric or character?  If character do the values have different numbers of leading spaces?  

Example:

data test;
  length name2 $30;
  set sashelp.class(obs=3);
  name2=name;
  output;
  name2=' '||name2;
  output;
  name2=' '||name2;
  output;
run;

proc freq data=test;
tables name name2;
run;
The FREQ Procedure

                                    Cumulative    Cumulative
Name       Frequency     Percent     Frequency      Percent
------------------------------------------------------------
Alfred            3       33.33             3        33.33
Alice             3       33.33             6        66.67
Barbara           3       33.33             9       100.00


                                      Cumulative    Cumulative
name2        Frequency     Percent     Frequency      Percent
--------------------------------------------------------------
  Alfred            1       11.11             1        11.11
  Alice             1       11.11             2        22.22
  Barbara           1       11.11             3        33.33
 Alfred             1       11.11             4        44.44
 Alice              1       11.11             5        55.56
 Barbara            1       11.11             6        66.67
Alfred              1       11.11             7        77.78
Alice               1       11.11             8        88.89
Barbara             1       11.11             9       100.00
Astounding
PROC Star

As @Tom noted, there are other characters that won't be visible, but can still be part of your variable.  Leading blanks are one possibility, but other characters can cause a similar problem.  Luckily, it's easy to get rid of all non-digits:

 

var = compress(var, , 'kd' );

 

KD = Keep Digits, and any other characters will be removed.  Apply that statement to your data, and see if there is still a problem with the PROC FREQ.

rebecca8
Fluorite | Level 6

Amazing! Thank you for all the help and the detailed explanations. Problem solved 🙂 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1423 views
  • 4 likes
  • 4 in conversation