Dear,
My data contains date variable with the following values. I am using the following code to get the output. I got the output except for a few OBS containing 'jun' as the month because I am trying to remove 'un' charcters using tranwrd function which removes 'un' in 'jun' month. Please help me simply the code. Thanks
date
//2011
00/UNK/UNK
01/Dec/2014
02/UNK/2014
18/jun/2008
UN/mar/2023
Un/apr/2013
uN/sep/2011
un/oct/2009
Un/UNK/2003
UN/UNK/2002
uN/UNK/2015
Un/UNK/2006
code;
date1=tranwrd(date_raw,'/','');
date2=tranwrd(date1,'UNK','');
date3=tranwrd(date2,'un','');
date4=tranwrd(date3,'UN','');
date5=tranwrd(date4,'Un','');
date6=compress(tranwrd(date5,'uN',''));
if length(date6)=9 then STDTC2= put(input(date6,date9.),is8601da.);
if length(date6)=7 then STDTC3= put(input(date6,monyy7.),yymmd7.);
if length(date6)=4 then STDTC4= put(input(date6,best.),10.);
if stdtc2 ^='' then isdate=stdtc2;
else if stdtc3 ^='' then isdate=stdtc3;
else if stdtc4 ^='' then isdate=stdtc4;
output needed;
2011
blank
2014-12-01
2014--02
2008-06-18
2023-03
2013-04
2011-09
2009-10
2003
2002
2015
2006
Add to your code at the beginning:
if input(datex, ?? ddmmyy10.) ne .
then date_out = put(input(datex, ddmmyy10.), yymmdd10.);
else do;
... your code ...
end;
The only "problem" that may arouse is that date_out will be in dormat of: yyyy/mm/dd instead yyyy-mm-dd
and that can be corrected by date_out = translate(date_out, '-' , '/' );
Thank you for help. If I use my code where you mentioned to use. Does it work for values 'Un/jun/2012' . My code removes 'un' in jun. Please help.
The expression input(datex, ?? ddmmyy10.) checks for valid date and
assigns a number that is the days passed since 01-01-1960 otherwise it
assingns a missing value.
As 'Un/jun/2012' is not a valid date, it will execute the ELSE part of the IF statement.
In oredr to overcome JUN month, I would prefer a differnt code than yours assuming that
date_raw is always in a format of day/month/year .
I havn't run this code. You may try it:
%LET MM = 'JAN' ' FEB' 'MAR' 'APR' 'MAY' 'JUN' 'JUL' 'AUG' 'SEP' 'OCT' 'NOV' 'DEC' ;
DATA WANT;
SET HAVE;
array mx $ &mm;
if input(datex, ?? ddmmyy10.) ne . /* valid date given */
then date_out = put(input(datex, ddmmyy10.), yymmdd10.);
else do;
month = scan(date_raw,2);
year = scan(date_raw,3);
if length(trim(month)) = 2 and /* 2 digits month */
1 le input(month, best2.) le 12 and
input(year, best4.) ge 1960 /* check for reasonable year */
then date_out = catx( '-', year,month);
else if length(trim(month)) = 3 and /* 3 characters month */
upcase(month) in (&mm) and
input(year, best4.) ge 1960
then do;
do i=1 to 12;
if mx(i) = upcase(trim(month)) then do;
month = put(i , z2.); /* convert month mmm to its number, 2 digits */
date_out = catx( '-', year, trim(month));
leave;
end;
end; end;
else if input(year, best4.) ge 1960
then date_out = trim(year);
end;
run;
The mdy() function is quite helpful here. See code below.
data have;
infile datalines truncover;
input have_str $11.;
datalines;
//2011
00/UNK/UNK
01/Dec/2014
02/UNK/2014
18/jun/2008
UN/mar/2023
Un/apr/2013
uN/sep/2011
un/oct/2009
Un/UNK/2003
UN/UNK/2002
uN/UNK/2015
Un/UNK/2006
Un/##/2006
;
run;
options missing=' ';
data want;
set have;
length want_str $11.;
/* valid day, month and year? */
want_str=put(input(have_str, ?? date11.),yymmdd10.);
if missing(want_str) and not missing(have_str) then
do;
length _day $2 _month $3 _year $4;
_day =scan(have_str,1,'/','m');
_month=put(month(input(scan(have_str,2,'/','m')||'2000', ?? monyy.)),z2.);
_year =scan(have_str,3,'/','m');
/* valid day and year? */
if mdy('01',_day,_year) then want_str=_year||'--'||_day;
/* valid month and year? */
else if mdy(_month,'01',_year) then want_str=_year||'-'||_month;
/* valid year? */
else if mdy('01','01',_year) then want_str=_year;
end;
run;
Thank you for the help. I got the output I need. But I am getting warning message as "invalid argument for OBS where _day='un' 'UN' and _year='UNK'. I added a few additional lines to the code. Is there any other option to simplyfy it. Thanks.
code;
options missing=' ';
data want;
set have;
length want_str $11.;
/* valid day, month and year? */
want_str1=put(input(have_str, ?? date11.),yymmdd10.);
if missing(want_str) and not missing(have_str) then
do;
length _day $2 _month $3 _year $4;
_day =scan(have_str,1,'/','m');
_month=put(month(input(scan(have_str,2,'/','m')||'2000', ?? monyy.)),z2.);
_year =scan(have_str,3,'/','m');
/* valid day and year? */
if _day not in ('un' 'UN' 'uN' 'Un') and _year ^='UNK' then do;
if mdy('01',_day,_year) then want_str2=_year||'--'||_day;
end;
/* valid month and year? */
if _year ^='UNK' then do;
if mdy(_month,'01',_year) then want_str3=_year||'-'||_month;
end;
/* valid year? */
if _year^='UNK' then do;
if mdy('01','01',_year) then want_str4=_year;
end;
end;
if want_str1 ^='' then wantstr=want_str1;
else if want_str2 ^='' then wantstr=want_str2;
else if want_str3 ^='' then wantstr=want_str3;
else if want_str4 ^='' then wantstr=want_str4;
run;
I should have checked the log.... mdy() requires numerical variables. Below a code version which creates a bit less Notes in the log about conversion from character to numerical.
data have;
infile datalines truncover;
input have_str $11.;
datalines;
//2011
00/UNK/UNK
01/Dec/2014
02/UNK/2014
18/jun/2008
UN/mar/2023
Un/apr/2013
uN/sep/2011
un/oct/2009
Un/uNk/2003
Un/##/2006
99/999/2006
;
run;
options missing=' ';
data want;
set have;
length want_str $11.;
/* valid day, month and year? */
want_str=put(input(have_str, ?? date11.),yymmdd10.);
if missing(want_str) and not missing(have_str) then
do;
length _day _month _year 8;
_day =input(scan(have_str,1,'/','m'), ?? best32.);
_month=month(input(scan(have_str,2,'/','m')||'2000', ?? monyy.));
_year =input(scan(have_str,3,'/','m'),?? best32.);
if _day<1 or _day>31 then call missing(_day);
if _month<1 or _month>12 then call missing(_month);
/* valid day and year? */
if mdy(01,_day,_year) then want_str=put(_year,z4.)||'--'||put(_day,z2.);
/* valid month and year? */
else if mdy(_month,01,_year) then want_str=put(_year,z4.)||'-'||put(_month,z2.);
/* valid year? */
else if mdy(01,01,_year) then want_str=put(_year,z4.);
end;
run;
If you're using my code version then you don't need to add all these checks like if _day not in ('un' 'UN' 'uN' 'Un') and _year ^='UNK' then do; The code I've posted works fine without it and _day will either contain a number or a missing.
If the number in _day is a valid day then the mdy() function will return a value (if the 2 other parameters are valid as well), else it will return a missing. So that's the check.
As you can see this code version doesn't only deal with "UN/UNK" but with any string which doesn't represent a valid day, month or year.
Dear,
My data contains date variable with the following values. I am using the following the code. I am not able to run the code. Please help. It giving me error message.Thanks
datex
//2011
00/UNK/UNK
01/Dec/2014
02/UNK/2014
18/jun/2008
UN/mar/2023
Un/apr/2013
uN/sep/2011
un/oct/2009
Un/UNK/2003
UN/UNK/2002
uN/UNK/2015
Un/UNK/2006
un/jun/2002
output needed;
2011
blank
2014-12-01
2014--02
2008-06-18
2023-03
2013-04
2011-09
2009-10
2003
2002
2015
2006
code;(Some one posted this code);
%LET MM = 'JAN' ' FEB' 'MAR' 'APR' 'MAY' 'JUN' 'JUL' 'AUG' 'SEP' 'OCT' 'NOV' 'DEC' ;
DATA WANT;
SET HAVE;
array mx $ &mm;
if input(datex, ?? ddmmyy10.) ne . /* valid date given */
then date_out = put(input(datex, ddmmyy10.), yymmdd10.);
else do;
month = scan(date_raw,2);
year = scan(date_raw,3);
if length(trim(month)) = 2 and /* 2 digits month */
1 le input(month, best2.) le 12 and
input(year, best4.) ge 1960 /* check for reasonable year */
then date_out = catx( '-', year,month);
else if length(trim(month)) = 3 and /* 3 characters month */
upcase(month) in (&mm) and
input(year, best4.) ge 1960
then do;
do i=1 to 12;
if mx(i) = upcase(trim(month)) then do;
month = put(i , z2.); /* convert month mmm to its number, 2 digits */
date_out = catx( '-', year, trim(month));
leave;
end;
end; end;
else if input(year, best4.) ge 1960
then date_out = trim(year);
end;
run;
Please don't post the same question twice.
Here the link to the thread with replies:
After hard work I simplified the code as:
data have;
infile datalines;
input datex $11. ;
datalines;
//2011
00/UNK/UNK
01/Dec/2014
02/UNK/2014
18/jun/2008
UN/mar/2023
Un/apr/2013
uN/sep/2011
un/oct/2009
Un/UNK/2003
UN/UNK/2002
uN/UNK/2015
Un/UNK/2006
; run;
%LET MM = 'JAN' ' FEB' 'MAR' 'APR' 'MAY' 'JUN' 'JUL' 'AUG' 'SEP' 'OCT' 'NOV' 'DEC' ;
DATA WANT(keep=datex date_out day month year ); /*drop unrelevant variables */
length date_out $10 day $2 month $3 year $4;
SET HAVE;
array mx {12} $ ( &mm );
day = scan(datex,1,'/','m');
month = scan(datex,2,'/','m');
year = scan(datex,3,'/','m');
if length(strip(month)) = 3 and /* 3 characters month */
upcase(month) in (&mm) and
input(year,?? best4.) ge 1960
then do;
do i=1 to 12;
if mx(i) = upcase(trim(month)) then do;
month = put(i , z2.); /* convert month mmm to its number, 2 digits */
leave;
end;
end; end;
lday = compress(day,'0123456789');
lmonth = compress(month,'0123456789');
lyear = compress(year,'0123456789');
if lday=' ' and lmonth=' ' and lyear=' '
then date_out = catx('-',year,month); else
if lday=' ' and lyear=' ' and lmonth ne ' '
then date_out = cat(year,'--',day); else
if lday ne ' ' and lmonth=' ' and lyear=' '
then date_out = catx('-',year,month); else
if lday ne ' ' and lmonth ne ' ' and lyear=' '
then date_out = year; else date_out = ' ';
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.