Hi there,
I am running a proc freq for a numerical variable - and SAS is generating separate rows for the same numbers. For instance, it goes in chronological order w/ the frequencies from 1 to 156, then after that it starts going through lower numbers again. So there may be fifteen 1s, but SAS is separating them into one group of 8 and one group of 7 in the proc freq - essentially reporting the response of '1' as separate responses when it is the same.
I've checked that in the input data the font/size/spacing of these variables is all the same. Is there a way I can program for it to stop doing that?
Thanks!
As @Tom noted, there are other characters that won't be visible, but can still be part of your variable. Leading blanks are one possibility, but other characters can cause a similar problem. Luckily, it's easy to get rid of all non-digits:
var = compress(var, , 'kd' );
KD = Keep Digits, and any other characters will be removed. Apply that statement to your data, and see if there is still a problem with the PROC FREQ.
Hi @rebecca8
Is the type of the variable character or numeric? Have they a format?
You can run a PROC CONTENTS to see this.
Proc contents data=have;
run;
If it is character, could you please run the following program to see if there are some trailing blanks:
data have2;
set have;
if prxmatch('/\s/',my_var) > 0 then flag=1;
run;
(replace 'my_var' by you variable name and 'have' by your dataset name)
Hi there,
I ran this, and there does appear to be some trailing/leading spaces. But when I try either the trim or the strip function, it doesn't seem to make a difference and the table is still generating identical variables out separately.
Any suggestions moving forward?
To remove leading spaces use LEFT() function.
var=left(var);
But that assumes the leading characters are actually spaces and not some other non-printing character.
What are the leading characters in your data?
Try running your proc freq but format the variable using $HEX2. format so that you will just see the hex code for the first characters.
Normal characters are in general the values between space '20'x and tilda '7E'x.
'30'x to '39'x are the digits.
'41'x is uppercase A.
Here are some common invisible characters:
'20'x is a space.
'00'x is a binary zero.
'09'x is a tab.
'0A'x is a linefeed.
'0D'x is a carriage return.
'A0'x is the goofy character Microsoft invented to use as a non-breaking space for typesetting.
To remove those characters you can use compress
var=compress(var,'090A0D00A0'x);
Or convert them to spaces using translate:
var=left(translate(var,' ','00090A0DA0'x));
Hi @rebecca8
Maybe you could try to do this:
data have2;
set have;
num_2=compress(num);
proc freq data=have2;
tables num_2;
run;
You will need to show examples of the values that are at issue to get a complete answer. But if PROC FREQ considers the values as different then they are different. You cannot fix it without changing the values that PROC FREQ is grouping by.
Is your variable numeric or character? If character do the values have different numbers of leading spaces?
Example:
data test;
length name2 $30;
set sashelp.class(obs=3);
name2=name;
output;
name2=' '||name2;
output;
name2=' '||name2;
output;
run;
proc freq data=test;
tables name name2;
run;
The FREQ Procedure Cumulative Cumulative Name Frequency Percent Frequency Percent ------------------------------------------------------------ Alfred 3 33.33 3 33.33 Alice 3 33.33 6 66.67 Barbara 3 33.33 9 100.00 Cumulative Cumulative name2 Frequency Percent Frequency Percent -------------------------------------------------------------- Alfred 1 11.11 1 11.11 Alice 1 11.11 2 22.22 Barbara 1 11.11 3 33.33 Alfred 1 11.11 4 44.44 Alice 1 11.11 5 55.56 Barbara 1 11.11 6 66.67 Alfred 1 11.11 7 77.78 Alice 1 11.11 8 88.89 Barbara 1 11.11 9 100.00
As @Tom noted, there are other characters that won't be visible, but can still be part of your variable. Leading blanks are one possibility, but other characters can cause a similar problem. Luckily, it's easy to get rid of all non-digits:
var = compress(var, , 'kd' );
KD = Keep Digits, and any other characters will be removed. Apply that statement to your data, and see if there is still a problem with the PROC FREQ.
Amazing! Thank you for all the help and the detailed explanations. Problem solved 🙂
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.