- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I imported an excel spreadsheet and one of the variables seems to have some extra characters of some sort. It is a character variable called "Deviation" and the responses are "YES", "NO", and "N/A", yet it thinks I have two "NO" groups and two "YES" groups when I do a frequency table.
data have;
deviation frequency
N/A 639
YES 20
YES 1
NO 524
NO 32
data want;
Deviation Frequency
N/A 639
YES 21
NO 556
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You will need to check the meaning of those COMPRESS() function options in the documentation.
https://www.google.com/search?q=%40sas.com+compress()+function
The 2F on your first line is just the slash character, /
But the A0 in your last row is the dreaded non-breaking space (or at least that is what Microsoft has decided to use that particular non-printing character to mean and lot of other software has adopted it.).
See what happens if you just remove the non-breaking space. While you are at in remove any leading spaces.
Here is code to remove the non-breaking spaces, left align (aka remove leading spaces) and also generate hex codes for the first 10 characters. Then run PROC FREQ and show both the text and the hexcode version on the same line so it is easier to see what letter the hex codes represent.
data temp;
set have;
all_managers = left(compress(all_managers,'A0'x));
hex=put(all_managers,$hex20.);
run;
proc freq data=temp;
tables all_managers*hex / list;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=lefunctionsref&docsetTarg...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It helps to look at the output using the regular old plan text LISTING destination instead of some graphically complex ODS destination.
You can try using the $QUOTE. format to help see the leading spaces.
You can use the $HEX. format to see what characters are actually in the variable.
Most likely you have leading spaces. ('20'X) You might have trailing linefeeds ('0A'x). It is possible you have some non-breaking spaces ('A0'x) on either side.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I used the following code and got this
data temp;
set have;
all_managers = compress(all_managers, , 'ad');
run;
proc freq data=temp;
tables all_managers;
format all_managers $hex30.;
run;
Deviation FREQUENCY
2F202020 639
A0202020 33
Not sure what this means....
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try removing the formats first and see if it solved the issue.
I think you need to add the K option if you're using the ad option which would remove all numeric and alphabetic characters. I suggested the s modifier to remove white space.
The HEX format, shows you the values as hex code. So 2F, A0, 20 are all different white space characters.
2F - /
20 - space
A0 is a white space that's likely causing part of your issue.
data temp;
set have;
all_managers = compress(all_managers, , 's');
run;
proc freq data=temp;
tables all_managers;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You will need to check the meaning of those COMPRESS() function options in the documentation.
https://www.google.com/search?q=%40sas.com+compress()+function
The 2F on your first line is just the slash character, /
But the A0 in your last row is the dreaded non-breaking space (or at least that is what Microsoft has decided to use that particular non-printing character to mean and lot of other software has adopted it.).
See what happens if you just remove the non-breaking space. While you are at in remove any leading spaces.
Here is code to remove the non-breaking spaces, left align (aka remove leading spaces) and also generate hex codes for the first 10 characters. Then run PROC FREQ and show both the text and the hexcode version on the same line so it is easier to see what letter the hex codes represent.
data temp;
set have;
all_managers = left(compress(all_managers,'A0'x));
hex=put(all_managers,$hex20.);
run;
proc freq data=temp;
tables all_managers*hex / list;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content