data step executes differently with char vs. numeric

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

data step executes differently with char vs. numeric

I have two similar character variables in a merged dataset and wanted to create a new variable where if the value in the preferred variable is missing, the value from the second variable is substituted.   I first set the new third variable to all missing.  Then I did two if/then statements.

 

data temp;
set ann.cohort2008_11;
grade_09temp = .;
if grade_mb09 ne ' ' then grade_09temp = grade_mb09;
if grade_mb09 = ' ' and grade_09 ne ' ' then grade_09temp = grade_09;
proc freq;
tables grade_mb09 grade_09temp;
run;

 

This code results in all grade_09temp values = to 0.

But if I change the third line of code so that grade_09temp is initially set to .  (so a numeric), the program works.  I realize it's a bad idea to convert character variables to numeric this way, and probably didn't need to set the new variable initially to missing.  However, my question is why the program works differently with the change from numeric to character in that one line of code? 

 

Many thanks for insights.  I'm wary of not understanding why something occurs, even if I can find a way around it.  It sets the stage for later errors in similar situations.

 


Accepted Solutions
Solution
‎11-15-2016 11:21 PM
Super User
Posts: 5,497

Re: data step executes differently with char vs. numeric

Here's a possibiity.  Consider what happens when you code:

 

grade_09temp=' ';

 

This defines GRADE_09TEMP as character with a length of 1.  If your other variables (GRADE_MB09 and GRADE_09) are character, but are not left-hand-justified, you will end up with GRADE_09TEMP always missing.  There is only room to hold 1 character, and the first character in the assigned value is a blank.

 

On the other hand, when you create GRADE_09TEMP as numeric, the later assignment statements force the software to convert the other variables (their full value) to numeric and assign that to GRADE_09TEMP.

View solution in original post


All Replies
Super User
Posts: 19,770

Re: data step executes differently with char vs. numeric

Yes, you're treating the variable inconsistently as numeric/character.

 

A period is for numeric missing

A blank is for character missing.

 

One way to avoid types is to use CALL MISSING which will set the variable to missing.

Additionally, MISSING() function will check either a character or numeric for a missing value and return a True/False or 1/0 in either case so your code can be type ignorant.

 

http://support.sas.com/documentation/cdl/en/lrcon/68089/HTML/default/viewer.htm#p175x77t7k6kggn1io94...

Occasional Contributor
Posts: 7

Re: data step executes differently with char vs. numeric

I appreciate the speedy reply, but this is not my question.  I don't understand why if I set the grade_09temp to a character variable that is missing, then specify it is = to one character variable or a second character variable if the first is not present, that the program results all missing data for grade_09temp. 

 

But if I change grade09temp to a numeric by changing = ' ' to = ., then the program works and I get a new grade variable that is drawn from the two original variables.

 

I realize I contributed to the confusion by saying 0 vs. missing.  Sorry about that.

 

Any insights on why the code works when I initially set grade_09temp to a missing numeric and why it doesn't work when I set grade_09temp to a missing character variable would be appreciated.

 

I appreciate the suggestion for the alternate way to deal with missing values on a variable.

 

A

Super User
Posts: 19,770

Re: data step executes differently with char vs. numeric

I clearly misunderstood your question, my bad. 

 

So the code presented is working, but a different version, not shown, is not working?

 

Perhaps show us the non working code, and include the log please.

Sample input data is also helpful.

 

Some guesses based on experience:

It's possible that your missing may not be missing, but have invisible characters? Depending on the data source, if it was imported from a text or excel file this is possible? Does this variable, grade_09temp, exist in your dataset already?

I don't see anything directly in your code that would be an issue. 

 

A workaround - and I do understand you want to know why this isn't working - is to use COALESCEC()

 

grade_09temp = Coalescec(grade_mb09, grade_09);

 

 

 

Solution
‎11-15-2016 11:21 PM
Super User
Posts: 5,497

Re: data step executes differently with char vs. numeric

Here's a possibiity.  Consider what happens when you code:

 

grade_09temp=' ';

 

This defines GRADE_09TEMP as character with a length of 1.  If your other variables (GRADE_MB09 and GRADE_09) are character, but are not left-hand-justified, you will end up with GRADE_09TEMP always missing.  There is only room to hold 1 character, and the first character in the assigned value is a blank.

 

On the other hand, when you create GRADE_09TEMP as numeric, the later assignment statements force the software to convert the other variables (their full value) to numeric and assign that to GRADE_09TEMP.

Occasional Contributor
Posts: 7

Re: data step executes differently with char vs. numeric

Posted in reply to Astounding

Thank you!  I just tried my same code for character variables, but with two spaces between the apostrophes and now the grades appear correctly.  I couldn't figure out why the 0 kept appearing--but it was because the first digit was a 0 for grade--03, 04 etc. 

 

Interesting that you don't need to put more than 1 period to indicate missing data for numeric, but can't use one space (between quotes) for missing character variables without these types of problems.

 

Reeza, thanks for your first and second replies and willingness to query further, I know more about coding missing data now. 

 

Astounding, thanks for an answer that explains the mystifying pattern I was seeing. 

 

I always want to understand the logic behind my errors--finding a way around them without understanding the error seems a recipe for future coding disasters.

Super User
Posts: 19,770

Re: data step executes differently with char vs. numeric

This is one of the ones that's hard to see from code but you would see if you could run an example and see the output. A proc contents or examination of output data would have shown the length of 1. 

 

I think missing with zero spaces, empty quotes, should get you the correct answer. But one space isn't missing, it's a space. 

At least I think....would need to test 😀

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 294 views
  • 3 likes
  • 3 in conversation