BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aw016
Obsidian | Level 7

Hello All,

I did data analysis with csv file on SAS, many character variables.

After running this code:

aw016_0-1603928873657.png

 

I got below notes and error, same errors showed up when I use  "where".

aw016_1-1603928453230.png

So I wonder, how to remove the error? Do I need to format all variables before analysis?

 

Thank you very much!

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

The problem originates with this statement:

if work=. then work_new=.;

The dot represents a missing value for a numeric variable.  Therefore,  the program creates WORK_NEW as a numeric variable, unable to hold character values like "1" and "2".  Start with this statement instead:

if work=. then work_new=" ";

That will give you a missing value for a character variable, able to hold values like "1" and "2".  You still need to switch to the IN operator as @Tom suggested.

View solution in original post

6 REPLIES 6
Tom
Super User Tom
Super User

Hard to reply when you post photographs of text.  The quotes in your font do not look like normal quotes to me. Make sure you don't have what Microsoft Word calls "smart" quotes (aka stupid quotes).

 

SAS will define a variable as soon as it has to based on the information at hand at that point.

So if WORK does not exist then your code will make it as numeric because the first place you reference it you are comparing to a numeric missing.  If you want to check if a character variable is missing compare it to a blank string. Or use MISSING() function.

 

Your last two lines are not code right.  You are testing if the strings C and Other are true or false since you have them as the right hand argument to the OR operator.  You probably want to use the IN operator instead.

if work in ('B','C') then ...
aw016
Obsidian | Level 7

Thank you @Tom for your help!

This is my first time posting a question here, still exploring 😅 I apologize for the inconvenient view of the picture, here is the original code:

data want_1; set want;
if work=. then work_new=.;
else if work= "White" then work_new="1";
else if work="Black" or "African American" then work_new="2";
else if work= "Other" or "American Indian" then work_new="3";
run;

The code you provided works for me to re-categorize the variable. The original file is .xlsx then transferred to .csv then put into the SAS. I actually didn't check if there are "smart quotes".

Astounding
PROC Star

The problem originates with this statement:

if work=. then work_new=.;

The dot represents a missing value for a numeric variable.  Therefore,  the program creates WORK_NEW as a numeric variable, unable to hold character values like "1" and "2".  Start with this statement instead:

if work=. then work_new=" ";

That will give you a missing value for a character variable, able to hold values like "1" and "2".  You still need to switch to the IN operator as @Tom suggested.

aw016
Obsidian | Level 7

Thank you very much @Astounding ! This is really helpful!

unison
Lapis Lazuli | Level 10

This is a perfect use case for PROC FORMAT. Create the format and apply with put(). I would also UPCASE() all of your values to make this conversion case-insensitive.

proc format;
	value $ethnic 
		'WHITE'='1' 
		'BLACK', 'AFRICAN AMERICAN'='2' 
		'AMERICAN INDIAN', other='3';
run;

data have;
	infile datalines dsd;
	input work :$20.;
	datalines;
White
Black
African American
Other
American Indian
z
;
run;

data want;
	set have;
	work_new=put(upcase(work), $ethnic.); /*Apply format*/
run;
proc print noobs;
run;

 

-unison
andreas_lds
Jade | Level 19

Variables should always be defined by using length or attrib-statement. This is especially important for char-variables, because if you omit the declaration, the first assignment defines the length - often to short to hold the values assigned later on.

And @unison is absolutely right: this is a job for a format. And this one of the rare cases in which i would use an informat:

proc format;
   invalue $ethnic (upcase) /* all values are automatically upcased */
      'WHITE' = '1' 
      'BLACK', 'AFRICAN AMERICAN' = '2' 
      'AMERICAN INDIAN', other = '3'
  ;
run;

data want;
   set have;
   work_new = input(work, $ethnic.);
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 802 views
  • 7 likes
  • 5 in conversation