I want to select the smallest value of two variables for each row in a data set.
The two variables have date values as character strings that follow the order of 4 digit year, 2 digit month, 2 digit day.
Some values for these variables have full dates (yyyymmdd) and others have partial dates (yyyymm or yyyy).
How can I select the smallest value between each of the two variables for each row? Will I need to convert the character strings to dates first? However, won't date formats on partial dates return missing value (".")?
For example:
I tried using the MIN function as Var3 = MIN(Var1, Var2)
Var1 | Var2 | Expected Minimum Date | Resulting Minimum Date Based on Date format | Resulting Minimum Date Based on Character String |
202108 | 20211201 | 202108 | 20211201 (because partial date of 202108 returns as "." when converting column to a date. Or is there a way to handle column of dates of varying lengths to prevent partial dates from being ".") | 202108 |
202112 | 20210824 | 20210824 | 20210824 | 202112 |
Dates should NEVER be character strings if you are going to work with them. They should always be converted to dates.
As far as partial dates are concerned, you need to come up with logic on how to handle these, how to turn these into "non-partial" dates. Then once you have such logic, you can turn them into valid SAS dates which contain year/month/day (not year/month) and then program the rest of your operations.
You can get good mileage using 8601 date format. SAS can read it and set missing day to 01 and same for missing month. But you can do compare with the character variables. Tones of papers and threads about partial dates you should check it out. YYYY-MM-DD works well in listings too.
49 data _null_;
50 v0='2021';
51 v1='2021-08';
52 v2='2021-12-01';
53 t1=v1 min v2;
54 t2=v1 max v2;
55 t3=v0 eq v1;
56 t4=v0 eq: v2;
57 put 'NOTE: ' (_all_)(/=);
58 run;
2 The SAS System 15:14 Friday, March 29, 2024
NOTE:
v0=2021
v1=2021-08
v2=2021-12-01
t1=2021-08
t2=2021-12-01
t3=0
t4=1
Given that you have partial dates (of the form YYYY, YYYYMM) it's uncharacteristically lucky that you have kept these as character variables. You can find the minimum of two character dates (partial or complete) by using a comparison operator (i.e "<", ">"), which will honor lexicographic ordering (i.e. "alphabetic" ordering).
data want;
set have;
mindate=ifc(var1<var2,var1,var2);
run;
Now mindate will still be a character variable, which you may need to convert to a regular numeric variable holding date values. But the initial comparison is easy.
CAVEAT: If var1 or var2 is blank that blank value will be less than all YYYY, YYYYMM, and YYYYMMDD character values. So you may need to protect against blank VAR1 or VAR2 values.
@melc wrote:
I want to select the smallest value of two variables for each row in a data set.
The two variables have date values as character strings that follow the order of 4 digit year, 2 digit month, 2 digit day.
Some values for these variables have full dates (yyyymmdd) and others have partial dates (yyyymm or yyyy).
How can I select the smallest value between each of the two variables for each row? Will I need to convert the character strings to dates first? However, won't date formats on partial dates return missing value (".")?
First worry about INFORMATS to read the values. Pick the desired format after you have a date value to display.
Things that appear as YYMM or YYYYMM can be read into a data value using the YYMMNw. informat. It will assume that the day of the month used for the date is the first day.
If you only have a 4-digit year then convert to numeric and use the MDY function to create the day and month in that year that you want to use for comparisons.
The real problem can be if you have mixed 6 character values that are in the YYMMDD and YYYYMM as 201212 might be 12 Dec 2020 (or 1920) or Dec 2012.
Please share your code. Looks like normal comparisons do what you want.
data have;
input (Var1 Var2 Expected) (:$8.);
cards;
202108 20211201 202108
202112 20210824 20210824
;
data want;
set have;
if var1<var2 then min=var1; else min=var2;
run;
Result
Obs Var1 Var2 Expected min 1 202108 20211201 202108 202108 2 202112 20210824 20210824 20210824
As already suggested you could use the appropriate informat to convert the date strings to SAS Date values (count of days since 1/1/1960 stored in a numerical variable).
For your date and partial date strings informat b8601da. appears to do the job. Partial dates will get aligned to the beginning of what's available (=beginning of month or beginning of year).
data demo;
infile datalines truncover dlm=' ';
input dt_string_1:$8. dt_string_2:$8.;
sas_dt_1=input(dt_string_1,b8601da.);
sas_dt_2=input(dt_string_2,b8601da.);
min_date=min(sas_dt_1,sas_dt_2);
format sas_dt_1 sas_dt_2 min_date yymmdd10.;
datalines;
202108 20211201
202112 20210824
2021 20210824
;
proc print data=demo;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.