BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cassie-Andra
Calcite | Level 5

Hello!

 

This seems like a ridiculous question to ask and yet I'm stumped.  All I want to do is create a simple dataset in SAS so that I can run some tests.  Here is the simple SAS code to create the dataset.

 

data test;
INPUT region $ Year 4. Disease $ n;
DATALINES;
MC 2015 Y 3024
MC 2016 Y 3262
MC 2017 Y 3157
MC 2018 Y 3155
MC 2019 Y 3093
MC 2015 n 4015119
MC 2016 n 4085287
MC 2017 n 4152344
MC 2018 n 4247434
MC 2019 n 4324214
;
run;

proc freq data= test;
table Disease;
run;

 

For some reason, when I run a frequency based upon the variable 'Disease', I get the following output:

Cassie-Andra_1-1611343767570.png

Why in the world is it breaking up 'n' as if it's a different category?  I can't run my tests like this.  Is there something I'm leaving off of my input code?  By the way, I need to expand this dataset significantly to include 4 other different regions, each with 5 different years of data worth of Y/N data.  

 

Thank you!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
The fix - you can use the COMPRESS function to remove white space, and I would suggest standardizing the case as well.

Disease = upcase(compress(disease, , 's'));


If you want to see what the character actually is, display the values with a HEX format - the two Ns should show up differently.

proc freq data= test;
format disease $hex.;
table Disease;
run;

09 is a tab
0A or A0 is the other common that I've seen, can't remember what it is off the top of my head.

View solution in original post

8 REPLIES 8
mkeintz
PROC Star

I can't reproduce your problem.  There must be a non-printable character next to one of the 'n' values.    Make sure your white spaces in the data are true blanks.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
CarmineVerrell
SAS Employee

I can't reproduce it either and get only two values in the frequency.

Reeza
Super User
The fix - you can use the COMPRESS function to remove white space, and I would suggest standardizing the case as well.

Disease = upcase(compress(disease, , 's'));


If you want to see what the character actually is, display the values with a HEX format - the two Ns should show up differently.

proc freq data= test;
format disease $hex.;
table Disease;
run;

09 is a tab
0A or A0 is the other common that I've seen, can't remember what it is off the top of my head.
Cassie-Andra
Calcite | Level 5

You were right about displaying with HEX format. This was my output.  

So, can you advise me further on what the fix for this is?  I've tried taking out spaces and placing only one space in between each value but it doesn't seem to solve the problem at all.  

Cassie-Andra_0-1611360048812.png

 

Thank you for the help!

 

Tom
Super User Tom
Super User

@Cassie-Andra wrote:

You were right about displaying with HEX format. This was my output.  

So, can you advise me further on what the fix for this is?  I've tried taking out spaces and placing only one space in between each value but it doesn't seem to solve the problem at all.  

Cassie-Andra_0-1611360048812.png

 

Thank you for the help!

 


Do NOT insert physical TAB character (the '09'x in your report above).  Change the options in whatever editor you are using to type your programs to replace the tabs with spaces.  

The INFILE statement also has an EXPANDTABS option that will convert tab characters to the proper number of spaces to mimic tab stops every 8 characters.

ballardw
Super User

Almost certainly the cause is one or more values with a leading space. Proc freq will display the value and strip the leading space appearing as if the value is the same. But the value is actually different so gets a different row in the output table.

You can see that here:

Data example;
   x = " Y";output;
   x = "Y"; output;
   x = "Y"; output;
   x = "Y"; output;
   x = "Y"; output;
   x = " n";output;
   x = "n"; output;
   x = "n"; output;
   x = "n"; output;
   x = "n"; output;
run;

proc freq data=example;
   tables x;
run;

Also notice that the order has the versions with spaces first in the output from above.

So it may be likely that your Y values are actually " Y".

 

Use a data step and the Length function to examine values.

Or clean up with Disease= strip(disease);

Cassie-Andra
Calcite | Level 5

It's so crazy because I've tried to assure (multiple times) that only one space exists between each value but I still continued to run into that issue, but somehow what you outlined must be happening.  I just used a compress function and fixed my issue!  So bizarre.  Thanks for the help!                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        

FreelanceReinh
Jade | Level 19

Hello @Cassie-Andra,

 

Reeza's diagnostic step using the $HEX. format will reveal the invisible character(s) you have typed (or copied/pasted) and then read into variable Disease before (or less likely: after) the visible letter. I don't think there are leading blanks ('20'x) because these would be skipped by list input (which you are using for Disease in the INPUT statement). Also non-breakable spaces ('A0'x) can be ruled out as these would be visible as indentations in HTML output -- unlike many other "invisible" characters. I suspect tab characters ('09'x) before the visible Disease values, possibly in all but one observation (see sort order, as pointed out by ballardw). A recent post by Tom mentioned that SAS Studio and the traditional Display Manager interface behave differently regarding tab characters in SAS code (which makes it hard to reproduce the problem using the DM, but your code as posted contains only ordinary blanks).

 

Note that due to formatted input for variable Year the character separating the Year and Disease columns will be read into variable Disease if it is not an ordinary blank ('20'x) or automatically replaced with a blank by the interface. (Maybe you inserted the normally redundant informat specification "4." after getting a note "Invalid data for Year ..." in the log when SAS tried to read the invisible character following the digits into Year?)

 

The cleanest way to avoid the issue will be to replace or delete the unwanted characters in the datalines after diagnosis ($HEXw.).

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1198 views
  • 0 likes
  • 7 in conversation