Hi mates!
I came across with this unexpected fact that caused some serious consequences.
So, I have a table1:
phone_num mob_num
'1' 1
'2' 2
'3' 3
...
phone_num has the same value as mob_num. The only difference is that phone_num is a text variable, mob_num is a numeric one.
Also I have a table2: phone_num
'4'
'5'
'6'
...
My simplest code is:
data table3;
set table1 table2;
if mob_num = . then mob_num=phone_num; run;
Question: what should be the result table?
I expected to see table3: phone_num mob_num
'1' 1
'2' 2
'3' 3
'4' 4
'5' 5
'6' 6
However, in fact the result appears like this: phone_num mob_num
'1' 1
'2' 2
'3' 3
'4' 4
'5' 4
'6' 4
What am I doing wrong in this simplest code?
I assume that I shouldn't see the expected result, but why there was not any warnings in the log?
Peace!
Victor Popovich
MTS, Russia
Message was edited by: Victor_Popovich
Suggest you use PROC CONTENTS to analyze the SAS variable types, which are either CHARACTER or NUMERIC. Unless there are user formats being applied, your assignment statement appears to be attempting to relate different variable types -- I would expect some diagnostic messages in your SAS log would be generated and would help you self-diagnose the program and your data.
For consideration, the values are not equal. You will need to consider extracting the numeric portion of the character-type variable, using SAS DATA step functions like SCAN, SUBSTR, STRIP, and then use the INPUT function to convert the character result to numeric. Then you can effectively compare values.
The behavior you saw has to do with the way SAS retains values in a SET with different variables in the two datasets. I've seen it documented (long ago), but couldn't find it in a cursory search of the 9.2 online docs. If you add the one line with the comment to table 2, you get the expected results.
Thanks, Scott. Here is the line I was looking for.
"When Reading a SAS Data Set
When variables are read with a SET, MERGE, or UPDATE statement, SAS sets the values to missing only before the first iteration of the DATA step. (If you use a BY statement, the variable values are also set to missing when the BY group changes.) The variables retain their values until new values become available;..."