Hello!
This seems like a ridiculous question to ask and yet I'm stumped. All I want to do is create a simple dataset in SAS so that I can run some tests. Here is the simple SAS code to create the dataset.
data test;
INPUT region $ Year 4. Disease $ n;
DATALINES;
MC 2015 Y 3024
MC 2016 Y 3262
MC 2017 Y 3157
MC 2018 Y 3155
MC 2019 Y 3093
MC 2015 n 4015119
MC 2016 n 4085287
MC 2017 n 4152344
MC 2018 n 4247434
MC 2019 n 4324214
;
run;
proc freq data= test;
table Disease;
run;
For some reason, when I run a frequency based upon the variable 'Disease', I get the following output:
Why in the world is it breaking up 'n' as if it's a different category? I can't run my tests like this. Is there something I'm leaving off of my input code? By the way, I need to expand this dataset significantly to include 4 other different regions, each with 5 different years of data worth of Y/N data.
Thank you!
I can't reproduce your problem. There must be a non-printable character next to one of the 'n' values. Make sure your white spaces in the data are true blanks.
I can't reproduce it either and get only two values in the frequency.
You were right about displaying with HEX format. This was my output.
So, can you advise me further on what the fix for this is? I've tried taking out spaces and placing only one space in between each value but it doesn't seem to solve the problem at all.
Thank you for the help!
@Cassie-Andra wrote:
You were right about displaying with HEX format. This was my output.
So, can you advise me further on what the fix for this is? I've tried taking out spaces and placing only one space in between each value but it doesn't seem to solve the problem at all.
Thank you for the help!
Do NOT insert physical TAB character (the '09'x in your report above). Change the options in whatever editor you are using to type your programs to replace the tabs with spaces.
The INFILE statement also has an EXPANDTABS option that will convert tab characters to the proper number of spaces to mimic tab stops every 8 characters.
Almost certainly the cause is one or more values with a leading space. Proc freq will display the value and strip the leading space appearing as if the value is the same. But the value is actually different so gets a different row in the output table.
You can see that here:
Data example; x = " Y";output; x = "Y"; output; x = "Y"; output; x = "Y"; output; x = "Y"; output; x = " n";output; x = "n"; output; x = "n"; output; x = "n"; output; x = "n"; output; run; proc freq data=example; tables x; run;
Also notice that the order has the versions with spaces first in the output from above.
So it may be likely that your Y values are actually " Y".
Use a data step and the Length function to examine values.
Or clean up with Disease= strip(disease);
It's so crazy because I've tried to assure (multiple times) that only one space exists between each value but I still continued to run into that issue, but somehow what you outlined must be happening. I just used a compress function and fixed my issue! So bizarre. Thanks for the help!
Hello @Cassie-Andra,
Reeza's diagnostic step using the $HEX. format will reveal the invisible character(s) you have typed (or copied/pasted) and then read into variable Disease before (or less likely: after) the visible letter. I don't think there are leading blanks ('20'x) because these would be skipped by list input (which you are using for Disease in the INPUT statement). Also non-breakable spaces ('A0'x) can be ruled out as these would be visible as indentations in HTML output -- unlike many other "invisible" characters. I suspect tab characters ('09'x) before the visible Disease values, possibly in all but one observation (see sort order, as pointed out by ballardw). A recent post by Tom mentioned that SAS Studio and the traditional Display Manager interface behave differently regarding tab characters in SAS code (which makes it hard to reproduce the problem using the DM, but your code as posted contains only ordinary blanks).
Note that due to formatted input for variable Year the character separating the Year and Disease columns will be read into variable Disease if it is not an ordinary blank ('20'x) or automatically replaced with a blank by the interface. (Maybe you inserted the normally redundant informat specification "4." after getting a note "Invalid data for Year ..." in the log when SAS tried to read the invisible character following the digits into Year?)
The cleanest way to avoid the issue will be to replace or delete the unwanted characters in the datalines after diagnosis ($HEXw.).
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.