DATA Step, Macro, Functions and more

When We Put Period Why If Statement Changes Whole Values If the Variable Has Character Type

Reply
Super Contributor
Posts: 381

When We Put Period Why If Statement Changes Whole Values If the Variable Has Character Type

Hello everyone,

 

I have a sample code as below, which includes if statements . As we know, if the variable has character type we need to put " " after if statement, if it has numeric type we need to put .(period) after if statement.

 

For character type variable, In the following code, "if "statement works fine when I put " " in the if statement. However, if I put .(period) in the "if" statement ,it changes the whole values .

 

I want to understand the reason why .(period) changes the all values if variable is character type.

 

I'm asking this question because I pull the whole if statements from SQL and this variables can be numeric type or character type, if I replace whole .(period) values with " " then some of my character type will be affected wrong.

 

Here is my sample code;

 

Data Have;
Length Variable $ 20 Value 8;
Infile Datalines Missover;
Input Variable Value;
Datalines;
. 1000
Istanbul 1000
Istanbul 1000
. 1000
;
Run;

Data Want;
Set Have;
If Variable=" " Then Value=1600;
Run;
/*Change Whole Values*/
Data Want;
Set Have;
If Variable=. Then Value=1600;
Run;

 May I have breif information, please?

 

Thank you

Super User
Super User
Posts: 7,401

Re: When We Put Period Why If Statement Changes Whole Values If the Variable Has Character Type

[ Edited ]

It is because you are mixing types.  Your first datastep:

data have;
  length variable $ 20 value 8;
  infile datalines missover;
  input variable value;
datalines;
. 1000
Istanbul 1000
Istanbul 1000
. 1000
;
Run;

Creates a dataset with 1 character datatype and one numeric datatype.  The "." when read in becomes "" as that is the empty character equivalent.

 

The your next datastep:

data want;
  set have;
  if variable="" then value=1600;
  /* or: if missing(variable) then... */
run;

I checking is the character string in variable equal to the character string on the right of the =.  You can use no space, a space, or the missing function they all do the same thing.  This works as ""="" then 1600, but "Istanbul"="" not true.  This is fine logic.

 

Your next datastep:

data want;
  set have;
  if variable=. then value=1600;
run;

Is invalid code.  What is happening is that at the if statement it is looking at the logic, = numeric value, and that to the left is a character string.  So it starts to implicitly convert your string into numeric, however "Istanbul" cannot be a number, so gets set to . or missing.  Therefore, all rows equate to missing empty=missing, and "Istanbul" cannot convert so missing.  

Its a good example of why you should always explicitly convert datatypes - know your data, program for your data.

 

 

 

 

Super User
Posts: 6,936

Re: When We Put Period Why If Statement Changes Whole Values If the Variable Has Character Type

The dot denotes the default MISSING value for numeric variables only. With character variables, the dot is simpy a 1-byte string containing the dot.

 

The log of your second data step is very verbose about your mistake:

23         Data Want;
24         Set Have;
25         If Variable=. Then Value=1600;
26         Run;

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
      25:4   
NOTE: Invalid numeric data, Variable='Istanbul' , at line 25 column 4.
Variable=Istanbul Value=1600 _ERROR_=1 _N_=2
NOTE: Invalid numeric data, Variable='Istanbul' , at line 25 column 4.
Variable=Istanbul Value=1600 _ERROR_=1 _N_=3
NOTE: There were 4 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 4 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.00 seconds

SAS tries to convert your Variable to numeric, fails in observations 2 and 3, which causes a missing numeric value, represented by the dot; therefore the condition is also true. In observation 1 and 4, the empty string is correctly converted to a missing value, causing the same result, but without ERROR.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 2 replies
  • 240 views
  • 0 likes
  • 3 in conversation