02-11-2015 05:44 AM
Here is a scenario where BETWEEN Operator is not working as expected.Please help me in understanding what could be the possible reason for this.
751 DATA VS;
752 SET STUDY.VS;
753 KEEP USUBJID VISIT VSTESTCD VSTEST VSORRES VSORRESU ;
754 WHERE VSTESTCD ='WEIGHT' AND VSORRES BETWEEN '35' AND '150';
756 /*WHERE (VSTESTCD='WEIGHT') AND (INPUT(VSORRES,BEST.) BETWEEN 35 AND 150);*/ This is working ;
NOTE: There were 6 observations read from the data set STUDY.VS.
WHERE (VSTESTCD='WEIGHT') and (VSORRES>='150' and VSORRES<='35'); - Please help in comprehending how the condition was negated.
NOTE: The data set WORK.VS has 6 observations and 6 variables.
NOTE: Compressing data set WORK.VS increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
VSORRES : CHAR VARIABLE HAVING NUMERIC VALUES
02-11-2015 06:02 AM
Well, to be honest I am not really going to look at the why this is happening and there is a simple reason, I would never recommend to use character data in a place where numeric data is expected. Its simply bad and lazy coding practice and leads to these kind of issues. I would suggest explicitly writing the where as you expect it to perform:
where vstestcd="WEIGHT" and 35 <= input(vsorres,best.) <= 150;
From the above statement we can clearly see that we are doing a numeric test and the upper and lower boundary (plus whether ton include the boundaries) is clearly shown and no "guessing" needs to take place.
As for the why it is doing that, well, could be any number of reasons, perhaps when it converts the character data to numeric (which is what it is doing, your just leaving it to fate to decide what to do), it is sorting the information it has, hence "150" as text appears before "35".
02-11-2015 06:17 AM
Usually Character comparisons happens character by character from left to right, if you use any comparison operators like < , > etc . '150' might appear to be a smaller value when compared to '35', so you may not get what you want. Better convert them to integer before you do anything on them.
02-11-2015 09:13 AM
What is the question here?
Are you surprised that SAS let you specify the higher value first when using the BETWEEN operator? That is a nice enhancement that SAS allows that some other languages do not allow. Makes the BETWEEN operator work more like natural language. Works the same way with numeric variables.
4 data x; set sashelp.class ;
5 where age between 12 and 9 ;
6 where also name between 'Sara' and 'Adam';
NOTE: WHERE clause has been augmented.
NOTE: There were 6 observations read from the data set SASHELP.CLASS.
WHERE (age>=9 and age<=12) and (name>='Adam' and name<='Sara');
NOTE: The data set WORK.X has 6 observations and 5 variables.
Or are you confused about the lexical ordering of character variables? '150' is less than '35' because '1' is less than '3' .
02-11-2015 10:20 AM
Run this program an look at the results. It may help you see the difference in character variable order compared with numeric. It creates some character variables that look like numbers, sorts them and prints the result.
input x $;
proc sort data=test; by x;run;
proc print data=test;run;