Hi,
I have two character date variables which I need to change to numeric variables. I have a basic program but the issue is that sometimes the data is imported in different formats – DD-MM with no year and the year should be the current year. My question is how do I take this into account in my data step? See y current program and sample data below. The "TBD" and "on going" values should be set to missing. Any help will be greatly apericated. Thanks!
data test_date;
set master;
start_date_new= input(substr(strip(start_date),1,10),MMDDYY10.);
end_date_new=input(substr(strip(end_date),1,10),MMDDYY10.);
format start_date_new end_date_new date9.;
run;
Start_Date | End_Date |
6/23/2015 | 7/20/2015 |
7/14/2015 | 9/5/2015 |
8/4/2015 | TBD |
8/20/2015 | 11/2/2015 |
8/27/2015 | TBD |
9-Feb | 1-Jul |
15-Jan | 1-Jul |
15-Jan | 1-Jul |
3-Feb | 1-Jul |
9-Feb | 1-Jul |
1-Jul | TBD |
1-Jul | TBD |
8/27/2015 | TBD |
10/6/2015 | TBD |
10/13/2015 | 11/2/2015 |
10/13/2015 | TBD |
3/31/2015 | 5/19/2015 |
5/21/2015 | On Going |
6/9/2015 | 7/6/2015 |
7/8/2015 | TBD |
10/20/2015 | 11/2/2015 |
8/5/2015 | TBD |
7/30/2015 | 11/2/2015 |
8/11/2015 | TBD |
8/18/2015 | 11/2/2015 |
8/25/2015 | 9/29/2015 |
9/10/2015 | 10/5/2015 |
18-Sep | 1-Oct |
18-Sep | 1-Oct |
9/29/2015 | 11/2/2015 |
8/4/2015 | 9/29/2015 |
8/11/2015 | TBD |
8/13/2015 | TBD |
7/8/2015 | 10/6/2015 |
8/5/2015 | ongoing |
10/1/2015 | 11/2/2015 |
Hi, using a portion of your data (you could drop S and E) ...
data x;
input (s e) (:$20.);
if findc(s,'-') then s=catt(s,'-15');
if findc(e,'-') then e=catt(e,'-15');
start = input(s,anydtdte10.);
end = input(e,anydtdte10.);
format st: end mmddyy10.;
datalines;
6/23/2015 7/20/2015
7/14/2015 9/5/2015
8/4/2015 TBD
8/20/2015 11/2/2015
8/27/2015 TBD
9-Feb 1-Jul
15-Jan 1-Jul
15-Jan 1-Jul
3-Feb 1-Jul
9-Feb 1-Jul
1-Jul TBD
1-Jul TBD
8/27/2015 TBD
10/6/2015 TBD
10/13/2015 11/2/2015
;
data set X ...
Obs s e start end
1 6/23/2015 7/20/2015 06/23/2015 07/20/2015
2 7/14/2015 9/5/2015 07/14/2015 09/05/2015
3 8/4/2015 TBD 08/04/2015 .
4 8/20/2015 11/2/2015 08/20/2015 11/02/2015
5 8/27/2015 TBD 08/27/2015 .
6 9-Feb-15 1-Jul-15 02/09/2015 07/01/2015
7 15-Jan-15 1-Jul-15 01/15/2015 07/01/2015
8 15-Jan-15 1-Jul-15 01/15/2015 07/01/2015
9 3-Feb-15 1-Jul-15 02/03/2015 07/01/2015
10 9-Feb-15 1-Jul-15 02/09/2015 07/01/2015
11 1-Jul-15 TBD 07/01/2015 .
12 1-Jul-15 TBD 07/01/2015 .
13 8/27/2015 TBD 08/27/2015 .
14 10/6/2015 TBD 10/06/2015 .
15 10/13/2015 11/2/2015 10/13/2015 11/02/2015
Hi, using a portion of your data (you could drop S and E) ...
data x;
input (s e) (:$20.);
if findc(s,'-') then s=catt(s,'-15');
if findc(e,'-') then e=catt(e,'-15');
start = input(s,anydtdte10.);
end = input(e,anydtdte10.);
format st: end mmddyy10.;
datalines;
6/23/2015 7/20/2015
7/14/2015 9/5/2015
8/4/2015 TBD
8/20/2015 11/2/2015
8/27/2015 TBD
9-Feb 1-Jul
15-Jan 1-Jul
15-Jan 1-Jul
3-Feb 1-Jul
9-Feb 1-Jul
1-Jul TBD
1-Jul TBD
8/27/2015 TBD
10/6/2015 TBD
10/13/2015 11/2/2015
;
data set X ...
Obs s e start end
1 6/23/2015 7/20/2015 06/23/2015 07/20/2015
2 7/14/2015 9/5/2015 07/14/2015 09/05/2015
3 8/4/2015 TBD 08/04/2015 .
4 8/20/2015 11/2/2015 08/20/2015 11/02/2015
5 8/27/2015 TBD 08/27/2015 .
6 9-Feb-15 1-Jul-15 02/09/2015 07/01/2015
7 15-Jan-15 1-Jul-15 01/15/2015 07/01/2015
8 15-Jan-15 1-Jul-15 01/15/2015 07/01/2015
9 3-Feb-15 1-Jul-15 02/03/2015 07/01/2015
10 9-Feb-15 1-Jul-15 02/09/2015 07/01/2015
11 1-Jul-15 TBD 07/01/2015 .
12 1-Jul-15 TBD 07/01/2015 .
13 8/27/2015 TBD 08/27/2015 .
14 10/6/2015 TBD 10/06/2015 .
15 10/13/2015 11/2/2015 10/13/2015 11/02/2015
Im sure there are more efficent ways then what i have done here, but this was a fun challenge to try out.
here is what i did
data have;
length start_date end_date $ 12.;
input Start_Date $ End_Date $;
datalines ;
6/23/2015 7/20/2015
7/14/2015 9/5/2015
8/4/2015 TBD
8/20/2015 11/2/2015
8/27/2015 TBD
9-Feb 1-Jul
15-Jan 1-Jul
15-Jan 1-Jul
3-Feb 1-Jul
9-Feb 1-Jul
1-Jul TBD
1-Jul TBD
8/27/2015 TBD
10/6/2015 TBD
10/13/2015 11/2/2015
10/13/2015 TBD
3/31/2015 5/19/2015
5/21/2015 On Going
6/9/2015 7/6/2015
7/8/2015 TBD
10/20/2015 11/2/2015
8/5/2015 TBD
7/30/2015 11/2/2015
8/11/2015 TBD
8/18/2015 11/2/2015
8/25/2015 9/29/2015
9/10/2015 10/5/2015
18-Sep 1-Oct
18-Sep 1-Oct
9/29/2015 11/2/2015
8/4/2015 9/29/2015
8/11/2015 TBD
8/13/2015 TBD
7/8/2015 10/6/2015
8/5/2015 ongoing
10/1/2015 11/2/2015
;
data want;
set have;
format new_start_date new_end_date mmddyy10.;
if notdigit(start_date) ne 1 then do;
if length(trim(left(start_date))) <= 6 then
new_start_date = input(compress(start_date || "-" || year(date())), date11.);
if length(trim(left(start_date))) > 6 then
new_start_date = input(start_date, mmddyy10.);
end;
if notdigit(end_date) ne 1 then do;
if length(trim(left(end_date))) <= 6 then
new_end_date = input(compress(end_date || "-" || year(date())), date11.);
if length(trim(left(end_date))) > 6 then
new_end_date = input(end_date, mmddyy10.);
end;
else status = end_date;
drop start_Date end_Date;
run;
If your data is coming from Excel then specify the entire column to have the same format before importing it into SAS.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.