BookmarkSubscribeRSS Feed
CatPaws
Calcite | Level 5

Hi!

I had an issue with SAS converting my numeric variables to character variables when imported from excel. To go around that, I saved it as a CSV then imported it. Now, I am doing a proc frequency on a character variable, and SAS is duplicating those variables. See pic below! On my spread sheet, I only have Female, Male, and Unknown, however, why are they being duplicated?

CatPaws_0-1644882855929.png

P.S How can I get UNKNOW to display the full name (UNKNOWN)?

 

2 REPLIES 2
Reeza
Super User

Quick solution:

 

FEMALE is not the same as female. The case difference does matter to SAS, though you can make it not matter by applying a format to make it consistent. Case doesn't matter to code, but it does matter to data. So proc freq is the same as PROC FREQ. In comparison, Python and R are both case sensitive in language AND data.

 

proc freq data=have;
table sex / missing;
format sex $upcase18.;
run;

However, to fix the unknown you will need to actually fix your data that you imported. I'm guessing you used PROC IMPORT and didn't write a data step? In that case, I would recommend adding the following statement to the PROC IMPORT code:

guessingrows=max;

This forces SAS to scan the full row before it imports the data, so it will really slow down your import process but you'll get cleaner data. 

 

Optimal solution:

Write an import step that will correctly read the file. You can use the code from the log as a starter version. 

And then to correct the case, use a data step and clean up the data, likely using PROPCASE, which will convert everything to lowercase and upper case the first character. Then run your proc freq. 

 

 

data clean;
set raw_data_from_import;

sex = propcase(sex);

run;

proc freq data=clean;
table sex;
run;

@CatPaws wrote:

Hi!

I had an issue with SAS converting my numeric variables to character variables when imported from excel. To go around that, I saved it as a CSV then imported it. Now, I am doing a proc frequency on a character variable, and SAS is duplicating those variables. See pic below! On my spread sheet, I only have Female, Male, and Unknown, however, why are they being duplicated?

CatPaws_0-1644882855929.png

P.S How can I get UNKNOW to display the full name (UNKNOWN)?

 


 

ballardw
Super User

Case is not the only thing can create the appearance of "duplicate" values in Proc Freq (and many other output tables).

Please copy and run this code locally:

data example;
  length text $ 10;
  text= 'abc';output;
  text= ' abc';output;
run;

proc freq data=example;
run;

The output from proc freq looks like you have "duplicate" values for text. In reality the second value has a leading space but Proc Freq just does not show it as the table output rules left justifies the text. For added fun add more spaces in front of some more values and you can have, apparently, 4, 5 or 6 "duplicates".

This is actually a tad harder to diagnose, at least the first time you see it, then different actual displayed values.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 514 views
  • 1 like
  • 3 in conversation