06-09-2016 06:20 AM
I have a sample code as below, which includes if statements . As we know, if the variable has character type we need to put " " after if statement, if it has numeric type we need to put .(period) after if statement.
For character type variable, In the following code, "if "statement works fine when I put " " in the if statement. However, if I put .(period) in the "if" statement ,it changes the whole values .
I want to understand the reason why .(period) changes the all values if variable is character type.
I'm asking this question because I pull the whole if statements from SQL and this variables can be numeric type or character type, if I replace whole .(period) values with " " then some of my character type will be affected wrong.
Here is my sample code;
Data Have; Length Variable $ 20 Value 8; Infile Datalines Missover; Input Variable Value; Datalines; . 1000 Istanbul 1000 Istanbul 1000 . 1000 ; Run; Data Want; Set Have; If Variable=" " Then Value=1600; Run; /*Change Whole Values*/ Data Want; Set Have; If Variable=. Then Value=1600; Run;
May I have breif information, please?
06-09-2016 06:33 AM - edited 06-09-2016 06:34 AM
It is because you are mixing types. Your first datastep:
data have; length variable $ 20 value 8; infile datalines missover; input variable value; datalines; . 1000 Istanbul 1000 Istanbul 1000 . 1000 ; Run;
Creates a dataset with 1 character datatype and one numeric datatype. The "." when read in becomes "" as that is the empty character equivalent.
The your next datastep:
data want; set have; if variable="" then value=1600; /* or: if missing(variable) then... */ run;
I checking is the character string in variable equal to the character string on the right of the =. You can use no space, a space, or the missing function they all do the same thing. This works as ""="" then 1600, but "Istanbul"="" not true. This is fine logic.
Your next datastep:
data want; set have; if variable=. then value=1600; run;
Is invalid code. What is happening is that at the if statement it is looking at the logic, = numeric value, and that to the left is a character string. So it starts to implicitly convert your string into numeric, however "Istanbul" cannot be a number, so gets set to . or missing. Therefore, all rows equate to missing empty=missing, and "Istanbul" cannot convert so missing.
Its a good example of why you should always explicitly convert datatypes - know your data, program for your data.
06-09-2016 06:37 AM
The dot denotes the default MISSING value for numeric variables only. With character variables, the dot is simpy a 1-byte string containing the dot.
The log of your second data step is very verbose about your mistake:
23 Data Want; 24 Set Have; 25 If Variable=. Then Value=1600; 26 Run; NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 25:4 NOTE: Invalid numeric data, Variable='Istanbul' , at line 25 column 4. Variable=Istanbul Value=1600 _ERROR_=1 _N_=2 NOTE: Invalid numeric data, Variable='Istanbul' , at line 25 column 4. Variable=Istanbul Value=1600 _ERROR_=1 _N_=3 NOTE: There were 4 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 4 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.00 seconds
SAS tries to convert your Variable to numeric, fails in observations 2 and 3, which causes a missing numeric value, represented by the dot; therefore the condition is also true. In observation 1 and 4, the empty string is correctly converted to a missing value, causing the same result, but without ERROR.