HI ,
I have character dates as below dataset.
Normally, we should have numeric dates for comparison (>, < etc.)
However, I came across following codes and want to know how those are processed by SAS .
As you can see below, it seems to get correct results for variables X: .
Can some one please explain ?
Thanks.
data have ;
a='2020-07-17';
b='2020-06-17';
c='2020-06-18';
if a<b then x1='a<b';
if a>b then x2='a>b';
if c>=b then x3='c>=b';
run;
Greater-than and less-than comparisons can be used for character values. They use "lexicographic" ordering. So
In your case you have 'yyyy-mm-dd' values so lexicographic ordering imitates chronological order, and the code works. Unlike regular numeric date values (0 for 01jan1960) however, you won't be able to generate the distance between two dates recorded this way.
Greater-than and less-than comparisons can be used for character values. They use "lexicographic" ordering. So
In your case you have 'yyyy-mm-dd' values so lexicographic ordering imitates chronological order, and the code works. Unlike regular numeric date values (0 for 01jan1960) however, you won't be able to generate the distance between two dates recorded this way.
As long as you have
your dates will sort correctly, but you won't be able to use them for calculations.
You want to be extremely cautious with < or > comparisons involving character values that look like numbers.
Character comparisons are done left to right, first character of one value compared to the first character of the second, then the second character and so on. The comparison will stop when the first characters are unequal and use the result to that point
So when everything has exactly the same length and structure, such as hyphens or other punctuation, the results for what you show makes sense.
However when you get to different length character values things are very likely not to be what you might want.
Data example; a= '111'; b= '20'; c= ' 45'; if a < b then put "A < B"; else put "A not < B"; if c < b then put "C < B"; else put "C not < B"; if a > C then put "A > C"; else put "A not > C"; run;
Why is '111' < '20'? Because the first character comparison, '1' and '2' the '1' is less than '2'.
Similar when the space in '045' is compared to the two in '20': space is less than 2 so the result is less.
For a great deal of things it is better off to have dates, times or datetime values as SAS date, time or datetime valued numeric values with an appropriate format. Then you can determine things like the number of days between two dates or increment values by some specified interval. Plus just changing the format can create different groups for analysis, graphing or report categories without having to modify the data.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.